Tony's Web Log: 2009

Friday, October 2, 2009

Using MD5 Encryption with C# and Microsoft SQL Server

Background

The .NET Framework provides developers with some easy to use classes for modern encryption. One of the more popular methods these days is the use of MD5 encryption. MD5 encryption, to quote from RFC 1232, “takes as input a message of arbitrary length and produces as output a 128-bit “fingerprint” or “message digest” of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest. The MD5 algorithm is intended for digital signature applications, where a large file must be “compressed” in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.” It was developed by Professor Ronald L. Rivest of MIT, and has become widely used as a standard encryption method for ASP.NET applications. See the Points of Interest at the bottom of this article for more practical information about MD5 usage.

Getting Started

Since .NET has made MD5 encryption so easy to use, I’m not including a demo project. I’ll just include the required C# methods, and a SQL script for creating a test database table.

Creating a Test Table

Using your local instance of MSSQL 2000 (this will probably work on 2005 as well, but I’ve not tested it as yet). You can rename the table and the columns at your leisure, just ensure you change the calls to the methods that I’ll detail below. For now, just run the following SQL script from a database you own:

if exists (select * from dbo.sysobjects where
         id = object_id(N'[dbo].[tblLogins]')
         and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[tblLogins]
GO

CREATE TABLE [dbo].[tblLogins] (
  [Login] [varchar] (25) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
  [Password] [binary] (16) NULL
) ON [PRIMARY]
GO

All we’re doing here is creating a table called ‘tblLogins’, with two columns: a login column (varchar 25) and a password column (binary 16).

Adding a New Login

You can use the following method in your ASP.NET Web Form or C# Windows Form application. All we’re doing here is making a connection to the database, and inserting a new login. I created an enumeration called ValidationCode so that I can handle responses from the method a little more clearly, but you can just as well just a void function to the same effect.

/* Return types that are thrown when login is attempted */
public enum ValidationCode
{
  LoginFailed=1,
  LoginSucceeded=2,
  ConnectionFailed=3,
  UnspecifiedFailure=4,
  LoginCreated=5
}

You will need to ensure that you have the following references added to your application:

using System.Data.SqlClient;
using System.Security.Cryptography;
using System.Text;

CreateNewLogin will accept 5 values. The first value is the name of the table in the database housing the logins (in this example, it’s tblLogin), the second and third values will be our desired login and password, in plain text, and the last two values are the names of the login column (Login) and the password column (Password) in our table, respectively.

public ValidationCode CreateNewLogin(string tableName, string strLogin,
     string strPassword, string loginColumn, string passColumn)
{
  //Create a connection
    string strConnString = System.Configuration.ConfigurationSettings.
                                          AppSettings["ConnString"];
  SqlConnection objConn = new SqlConnection(strConnString);

  // Create a command object for the query
    string strSQL = "INSERT INTO " + tableName + " (" + loginColumn +
                  "," + passColumn + ") " + "VALUES(@Username, @Password)";

  SqlCommand objCmd = new SqlCommand(strSQL, objConn);

  //Create parameters
    SqlParameter paramUsername;
  paramUsername = new SqlParameter("@Username", SqlDbType.VarChar, 10);
  paramUsername.Value = strLogin;
  objCmd.Parameters.Add(paramUsername);

  //Encrypt the password
    MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
  byte[] hashedBytes;
  UTF8Encoding encoder = new UTF8Encoding();
  hashedBytes = md5Hasher.ComputeHash(encoder.GetBytes(strPassword));
  SqlParameter paramPwd;
  paramPwd = new SqlParameter("@Password", SqlDbType.Binary, 16);
  paramPwd.Value = hashedBytes;
  objCmd.Parameters.Add(paramPwd);

  //Insert the record into the database
    try
  {
      objConn.Open();
      objCmd.ExecuteNonQuery();
      return ValidationCode.LoginCreated;
  }
  catch
  {
      return ValidationCode.ConnectionFailed;
  }
  finally
  {
      objConn.Close();
  }
}

You can test this method out by executing the function. If you attempt to select the information from the database directly, you will notice that the encryption has worked. Now, on to validation.

Validating a Login

This method will allow you to validate a login against a pre-existing login in the database. It’s important to note that we never actually return data to the requestor for evaluation. All of our evaluation is done server side; we only return row counts to the function, making it that much more secure.

//Returns a validation code based on the control's set login info
public ValidationCode ValidateLogin(string tableName, string strLogin,
     string strPassword, string loginColumn, string passColumn)
{
  try
  {
      string strConnString = this.ConnectionString;
      SqlConnection objConn = new SqlConnection(strConnString);
      string strSQL = "SELECT COUNT(*) FROM " + tableName +
          " WHERE " + loginColumn + "=@Username AND " + passColumn +
          "=@Password;";
      SqlCommand objCmd = new SqlCommand(strSQL, objConn);
      //Create the parameters
        SqlParameter paramUsername;
      paramUsername = new SqlParameter("@Username", SqlDbType.VarChar, 25);
      paramUsername.Value = strLogin;
      objCmd.Parameters.Add(paramUsername);

      //Hash the password
        MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
      byte[] hashedDataBytes;
      UTF8Encoding encoder = new UTF8Encoding();
      hashedDataBytes =
        md5Hasher.ComputeHash(encoder.GetBytes(strPassword));    

      //Execute the parameterized query
        SqlParameter paramPwd;
      paramPwd = new SqlParameter("@Password", SqlDbType.Binary, 16);
      paramPwd.Value = hashedDataBytes;
      objCmd.Parameters.Add(paramPwd);
      //The results of the count will be held here
        int iResults;
      try
      {
          objConn.Open();
          //We use execute scalar, since we only need one cell
            iResults = Convert.ToInt32(objCmd.ExecuteScalar().ToString());
      }
      catch
      //Connection failure (most likely, though
        //you can handle this exception however)
        {
          return ValidationCode.ConnectionFailed;
      }
      finally
      {
          objConn.Close();
      }

      if (iResults == 1)
          return ValidationCode.LoginSucceeded;
      else
          return ValidationCode.LoginFailed;
  }
  catch
  {
      return ValidationCode.UnspecifiedFailure;
  }
}

You probably noticed that both methods have the same signature. It would be easy to combine both into a single function, but for this example, I’m keeping them separate. But that’s all there is to it. You can now create a click event on your page or form, and call either of the functions, handling the return code appropriately. Again, you don’t have to use the return codes; you can easily just handle the exceptions or the counts.

Note: I think you already know that to make this code work you need to add these two below namespaces:

System.Security.Cryptography;

System.Text;

Tuesday, September 29, 2009

How to Mix Forms and Windows Authentication in ASP.NET

The situation: You have a Website that is available to both Intranet users and Internet users. You want users from the Intranet to be authenticated automatically using their domain credentials and Internet users to be authenticated using basic ASP.NET forms security.

The problem: When you configure IIS to allow both Anonymous Access and Integrated Windows Authentication, you cannot get the remote users NTLM name.

The fix: Configure IIS to use Anonymous Access for the entire site, with the exception of a single page. This page will be set to use Integrated Windows Authentication.

The gotcha is that, when you check "Anonymous Access" in IIS, you cannot get the user's NTLM name (even if you check "Integrated Windows Authentication"). The key is to specify "Integrated Windows Authentication" for exactly one page in your application. When Intranet users hit this page, we can get their NTLM name out of Request.ServerVariables["LOGON_USER"] and programmatically store that in the FormsAuthentication context. External Internet users can even hit this page and manually specify their domain credentials if they like, but these users will normally go through the standard login process.

STEP 1 - Create a new virtual directory in IIS. In the application root, only check "Anonymous Access". Do not check "Integrated Windows Authentication". This should look like:

STEP 2 - In your Web application, create a new form named "WinLogin.aspx" under the app's root. Also add a page named "Login.aspx" under the app's root and put an instance of the standard System.Web.UI.WebControls.Login control on it, e.g.:

STEP 3 - In IIS manager, select WinLogin.aspx and navigate to the "File Security" tab. Here disable "Anonymous Access" but enable "Integrated Windows Authentication" (this is crucial). This should look like:

STEP 4 - In your web application's web.config, add the following:





"Login.aspx">

  

    

      "?,*" />

    

  









"WinLogin.aspx">

  

    

      "?,*" />

    

  





 



  "Forms">

    "Login.aspx"/>

  



  



  

    "?" />

    "*" />

  



  



  "CustomMembershipProvider">

    

      

      "CustomMembershipProvider" type="DualAuthenticationExample.CustomMembershipProvider, DualAuthenticationExample" />

STEP 5 - In ~/Login.aspx add the following code:

namespace DualAuthenticationExample

{

public partial class LoginPage : System.Web.UI.Page

{

private bool IsIntranetRequest(string ip)

{

// TODO: Check for whatever interanet ip addresses, ranges, etc here... 



return !string.IsNullOrEmpty(ip) && Regex.IsMatch(ip, "^127");

}



protected void Page_Load(object sender, EventArgs e)

{

if (IsIntranetRequest(Request.ServerVariables["REMOTE_ADDR"]))

{

    Response.Redirect("~/WinLogin.aspx");

}

}

}

}

STEP 6 - In ~/WinLogin.aspx add the following code:

namespace DualAuthenticationExample

{

public partial class WinLoginPage : System.Web.UI.Page

{

protected void Page_Load(object sender, EventArgs e)

{

string user = Request.ServerVariables["LOGON_USER"];



if (string.IsNullOrEmpty(user))

{

    Response.Redirect("~/Login.aspx");

}

else

{

    FormsAuthentication.SetAuthCookie(user, false);

    Response.Redirect("~/Default.aspx");

}

}

}

}

So, as specified in the web.config, all users will be bounced to the Login.aspx page. If the user is on the intranet, they'll get bounced again to the WinLogin.aspx page, which will attempt to authenticate them with their NTLM name. If that is successful, then we programmatically log them into the FormsAuthentication context. From here, the user's name will be availble via the standard Page.User.Identity.Name property (as well as HttpContext.Current.User.Identity.Name)

If you know the IP range(s) for the Intranet requests you can really streamline what I have here and make it seamless to the user. This is technically not a requirement to get this to work but, without it, external users will be prompted with the authentication box (I believe the behavior for this varies between browsers and also depends on the user's current browser settings).

Wednesday, September 23, 2009

Write A SQL Function to Split JSON Data

JSON stands for JavaScript Object Notation, and is a lightweight data-interchange format. The fnSplitJSON2 function splits a JSON string and returns the information in a table. The returned table has extra columns which indicate if there is a nested JSON string or if an error was detected.

This is an example of using the fnSplitJson2 function:

-- object example, braces surround the name:value objects

SELECT id, name, value FROM dbo.fnSplitJson2('{Apples:20000,Oranges:400,Apricots:507}',NULL)

Results:

id	name	value
1	Apples	20000
2	Oranges	400
3	Apricots	507

I believe you will find the JSON format easy to use and very readable, and the fnSplitJson2 function a very useful tool.

I use this function to

use one parameter instead of many parameters in stored procedures or functions,
pass a data string into stored procedures from a website using AJAX calls,
allow for dynamic setting of T-SQL variables in stored procedures and functions,
general splitting of strings.

In this article I will give a brief introduction to the JSON format, describe the fnSplitJson2 syntax and options, and show some examples of its use.

Introduction to the JSON format

JSON has two string format structures, an array and an object. An object is an unordered set of name/value pairs. An array is an ordered collection of values.

An object begins with { (left brace) and ends with } (right brace). Each name is followed by colon and the name/value pairs are separated by a comma. For example:

{Apples:20000, Oranges:400, Apricots:507}

An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by a comma. For example:

[Apples,Oranges,Apricots]

JSON supports the use of hex characters in the format of \u0000 where each 0 represents a hex digit. The function will convert them to the nchar value. Other characters sequences that are automatically converted are:

sequence	converted to
\b	backspace
\f	formfeed
\n	newline
\r	carriage return
\t	tab

For more details about the formatting of a JSON string visit their website http://www.json.org

Function Description

Syntax

dbo.fnSplitJson2(@InputString, @Options)

Input Parameters (Arguments)

@InputString nvarchar(max) (SS2K8) nvarchar(4000) (SS2K)

@Options nvarchar(1023)= NULL

Notes

There are 2 versions of the function available, one which is compatible with SQL Server 2000 and 2005, and the other for SQL Server 2008. I will use the following terms when referring to these:

SS2K SQL Server 2000 (2005) version
SS2K8 SQL Server 2008 version

Return Table

Column	datatype	Description
id	int	The order the items were parsed in.
name	nvarchar(255)	In a object, the name portion. In an array, this will be NULL.
value	nvarchar(max)(SS2K8) nvarchar(4000)(SS2K)	In an object, the value portion. In an array the data itself.
offset	int	The offset in the input string that the element was found. 1 is the starting position.
length	int	The length of the string processed.
colon	int	In an object, the location of the colon character.
nested	int	If nested array or object was detected.
errcnt	int	Number of errors found.
msg	varchar(8000)	Messages containing errors or warnings about the record.

The output table gives more data than you will normally use, but I have found it helpful in debugging input strings and in determining if there are nested objects or arrays.

Options

The @Option parameter is used to change default settings when the function is run. They are passed into the function using the JSON object format.

name	datatype	default	Description
dateStyle	int	121	The style number used by the T_SQL CONVERT function. 121 =yyyy-mm-dd hh:mi:ss.mmm(24h). This is used when decoding \/DATE(n)\/.
idStart	int	1	The starting value for the id column.
idStep	int	1	The integer value to add to the last value inserted into the id column.
debug	bit	0	If 1 then the option settings are inserted into the returned table. The id values are negative offset from the idStart value.
removeQuotes	bit	1	If removeQuotes =1 and surrounding double quotes are detected, then the surrounding double quotes are removed and the control pattern \" is replaced with a ".
splitChar	nchar(1)	,(comma)	The JSON format uses a comma for the character to split on. The splitChar option allows you to specify a different character to split on.
verbose	int	1	Sets the level of messages displayed. 0 = no messages 1 = error messages

Using the Function

JSON Object String

In this example the function splits an JSON object formatted string into 3 records, using the name and value columns.

-- object example, braces surround the name:value objects

SELECT id, name, value FROM dbo.fnSplitJson2('{Apples:20000,Oranges:400,Apricots:507}',NULL)

id	name	value
1	Apples	20000
2	Oranges	400
3	Apricots	507

JSON Array String

In this example the function splits the JSON array formatted string into 3 records. Note that the only difference between this input string and the input string from the previous example is the surrounding brackets instead of the surrounding braces. The name column will always be NULL for a JSON array.

-- array example, brackets surround the comma seperated array

SELECT id, name, value FROM dbo.fnSplitJson2('[Apples:20000,Oranges:400,Apricots:507]',NULL)

id	name	value
1	NULL	Apples:20000
2	NULL	Oranges:400
3	NULL	Apricots:507

General Example

The following JSON object string has 5 named/value. There is an error in one, and nested JSON objects in another. Match each result line with it corresponding name/value pair.

SELECT id, name, value, nested, errcnt, msg FROM dbo.fnSplitJson2(

'{

hex convert:\u0048\u0069\u0021

,date convert:\/DATE(1227578400000)\/

,bad date value:\/DATE(1227578400)\/

,"quotes removed":"\"Good Morning!\", he said."

,nested 3: { "Width": 800,

"Height": 600,

"Title": "JSON split function",

"Thumbnail": {"Url": "http://www.example.com/image/SS2008jsontn",

"Height": 50,

"Width": "95"

},

"IDs": [1, 33, 77]

}

}'

, NULL)

Results

id	name	value	nested	errcnt	msg
1	hex convert	Hi!	0	0
2	date convert	2008-11-25 02:00:00.000	0	0
3	bad date value	\/DATE(1227578400)\/	0	1	1 messages. 1) "\/DATE(" found at position1 but the date is not 13 numeric characters.
4	quotes removed	"Good Morning!", he said.	0	0
5	nested 3	{ "Width": 800, "Height": 600, "Title": "JSON split function", "Thumbnail": {"Url": "http://www.example.com/image/SS2008jsontn", "Height": 50, "Width": "95" }, "IDs": [1, 33, 77] }	3	0

With record 5 (name = nested 3), you would take the value and run it into the function:

SELECT id, name, value, nested, errcnt, msg FROM dbo.fnSplitJson2(

'{ "Width":  800,      "Height": 600,      "Title":  "JSON split function",      "Thumbnail": {"Url": "http://www.example.com/image/SS2008jsontn",                           "Height": 50,                           "Width":  "95"                    },          "IDs": [1, 33, 77]   }'

, NULL)

Results

id	name	value	nested	errcnt	msg
1	Width	800	0	0
2	Height	600	0	0
3	Title	JSON split function	0	0
4	Thumbnail	{"Url": "http://www.example.com/image/SS2008jsontn", "Height": 50, "Width": "95" }	1	0
5	IDs	[1, 33, 77]	1	0

Using the Options

Split Character

Normally the comma is the split character. In this example the semicolon is set to be the split character.

SELECT value FROM dbo.fnSplitJson2('[Apples;Oranges;Apricots;Grapes]', '{splitChar:;}')

value

Apples

Oranges

Apricots

Grapes

removeQuotes

This example illustrates the use of the

Script

SELECT name, value FROM dbo.fnSplitJson2('{"remove quote test ":"\"Good Morning!\", he said."}', NULL)

UNION ALL

SELECT name, value FROM dbo.fnSplitJson2('{"remove quote test":"\"Good Morning!\", he said."}', '{removeQuotes:0}')

Results

name	value
remove quote test	"Good Morning!", he said.
" remove quote test "	"\"Good Morning!\", he said."

idStart and idStep Settings

In this example the start value for the id column is set to 10 and each following id value increments by 3.

SELECT id, value FROM dbo.fnSplitJson2('[Apples,Oranges,Apricots,Grapes]', '{idStart:10,idStep:3}')

Results

id	value
10	Apples
13	Oranges
16	Apricots
19	Grapes

Dynamic setting of T-SQL variables

For this example I will use the options setting code used within the fnSplitJson2 function. As you can see from the options available for fnSplitJson2, if I had to write it with a parameter for each option the code would look something like:

CREATE FUNCTION dbo.fnFoo

(@InputString nvarchar(max)

 , @dateStyle int = 121

, @idStart int = 1

, @idStep int = 1

, @debug bit = 0

,@removeQuotes bit = 1

, @splitChar nchar(1) = NULL

, @verbose int = 1

)

...

-- set default values if null passed in

SELECT @dateStyle = ISNULL(@dateStyle, 121)

, @idStart = ISNULL(@idStart, 1)

, @idStep = ISNULL(@idStep, 1)

, @debug = ISNULL(@debug, 0)

,@removeQuotes = ISNULL(@removeQuotes, 1)

, @splitChar = ISNULL(@splitChar, ',')

, @verbose = ISNULL(@verbose, 1)

...

where parameters 2 -8 are the options. In the function body we handle validation and set the default if a parameter has a NULL value.

To call this function we would have to specify all 7 option parameters, which makes the function difficult to call (in my opinion).

DECLARE @dateStyle int = 121

, @idStart int = 1

, @idStep int = 1

, @debug bit = 0

,@removeQuotes bit = 1

, @splitCharnchar(1) = NULL

, @verbose int = NULL

SELECT * FROM dbo.fnFoo('[hi]', @dateStyle, @idStart, @idStep, @debug, @removeQuotes, @splitChar, @verbose)

Using the fnSplitJson2 function allows us to have one parameter for the options instead of seven. In the body if the code we declare the variables and set the default values. fnSplitJson2 then calls itself with the @Options as the @InputString. It then validates and sets the variables using the returned table. For example:

CREATE FUNCTION dbo.fnSplitJson2

(@InputString nvarchar(max)

, @Options nvarchar(1023)= NULL

)

...

DECLARE @dateStyle int = 121

, @idStart int = 1

, @idStep int = 1

, @debug bit = 0

,@removeQuotes bit = 1

, @verbose int = 1

...

IF(@Options IS NOT NULL)

BEGIN

SELECT @verbose = CASE WHEN name = 'verbose' ANDISNUMERIC(value)= 1 THEN value ELSE @verbose END

, @RemoveQuotes = CASE WHEN name = 'removeQuotes' AND ISNUMERIC(value)= 1 THEN value ELSE @RemoveQuotes END

, @idStart = CASE WHEN name = 'idStart' ANDISNUMERIC(value)= 1 THEN value ELSE @idStart END

, @idStep = CASE WHEN name = 'idStep' ANDISNUMERIC(value)= 1 THEN value ELSE @idStep END

, @dateStyle = CASE WHEN name = 'dateStyle' ANDISNUMERIC(value)= 1 THEN value ELSE @dateStyle END

, @Debug = CASE WHEN name = 'debug' ANDISNUMERIC(value)= 1 THEN value ELSE @Debug END

, @splitChar = CASE WHEN name = 'SplitChar' THEN value ELSE @splitChar END

FROM dbo.fnSplitJson2(@Options,NULL);

END

...

The function call is now simpler, with only two required parameters. To use the default settings the call is:

SELECT * FROM dbo.fnSplitJson2('[hi]',NULL)

To set the options dateStyle and idStart the call is:

SELECT * FROM dbo.fnFoo('[hi]','{dateStyle:100,idStart:0}')

Notice that we only need to just pass in the options we want to set (2 in the example above), not all of them.

The function has the capability to return the option variable's values by setting the debug option equal to 1. The next example displays the debug and change the dateStyle to 100 (from the default 121):

Script

-- return option settings

SELECT id, name, value, msg FROM dbo.fnSplitJson2(NULL, '{debug:1,dateStyle:100}')

Results:

id	name	value	msg
-9	version	SS2008 V1.0 Aug 2009	Option Debug
-8	author	Ric Vander Ark	Option Debug
-7	license	Microsoft Public License (Ms-PL)	Option Debug
-6	verbose	1	Option Debug
-5	splitChar	,	Option Debug
-4	removeQuotes	1	Option Debug
-3	idStep	1	Option Debug
-2	idStart	1	Option Debug
-1	debug	1	Option Debug
0	dateStyle	100	Option Debug

Handling Dates

There is no date literal in JSON, however one popular way to include a date value in is to encode the date in the format of "\/DATE(1227578400000)\/", where the number is the number of milliseconds since 1970-01-01 UTC. The function converts this format into a T-SQL date style format. The default style is 121. The dateStyle option is used to change the default style.

Script:

SELECT name, value FROM dbo.fnSplitJson2('{default:\/DATE(1227578400000)\/}', NULL)

UNION ALL

SELECT name, value FROM dbo.fnSplitJson2('{100:\/DATE(1227578400000)\/}', '{dateStyle:100}')

UNION ALL

SELECT name, value FROM dbo.fnSplitJson2('{101:\/DATE(1227578400000)\/}', '{dateStyle:101}')

UNION ALL

SELECT name, value FROM dbo.fnSplitJson2('{102:\/DATE(1227578400000)\/}', '{dateStyle:102}')

UNION ALL

SELECT name, value FROM dbo.fnSplitJson2('{103:\/DATE(1227578400000)\/}', '{dateStyle:103}')

UNION ALL

SELECT name, value FROM dbo.fnSplitJson2('{109:\/DATE(1227578400000)\/}', '{dateStyle:109}')

UNION ALL

SELECT name, value FROM dbo.fnSplitJson2('{127:\/DATE(1227578400000)\/}', '{dateStyle:127}')

Results:

name	value
default	2008-11-25 02:00:00.000
100	Nov 25 2008 2:00AM
101	11/25/2008
102	2008.11.25
103	25/11/2008
109	Nov 25 2008 2:00:00.000AM
127	2008-11-25T02:00:00

Reference:

"SQL Server Books Online", "CAST and CONVERT", "Date and Time Styles" for date style information.

http://msdn.microsoft.com/en-us/library/bb299886.aspx

http://weblogs.asp.net/bleroy/archive/2008/01/18/dates-and-json.aspx

Date Conversion Error example:

Due to the limitation with the rounding of milliseconds before SQL Server 2008, the S2K version of fnSplitJson2 will give rounding errors. I ran the following script for each version.

Script:

DECLARE @json varchar(8000)

SELECT @json =

+ '{000:\/DATE(1227578400000)\/'

+ ',001:\/DATE(1227578400001)\/'

+ ',002:\/DATE(1227578400002)\/'

+ ',003:\/DATE(1227578400003)\/'

+ ',004:\/DATE(1227578400004)\/'

+ ',005:\/DATE(1227578400005)\/'

+ ',006:\/DATE(1227578400006)\/'

+ ',007:\/DATE(1227578400007)\/'

+ ',008:\/DATE(1227578400008)\/'

+ ',009:\/DATE(1227578400009)\/'

+ ',010:\/DATE(1227578400010)\/'

+ ',011:\/DATE(1227578400011)\/'

+ ',012:\/DATE(1227578400012)\/'

+ ',013:\/DATE(1227578400013)\/'

+ ',014:\/DATE(1227578400014)\/'

+ ',015:\/DATE(1227578400015)\/'

+ ',016:\/DATE(1227578400016)\/'

+ ',017:\/DATE(1227578400017)\/'

+ ',018:\/DATE(1227578400018)\/'

+ ',019:\/DATE(1227578400019)\/'

+ '}'

SELECT * FROM dbo.fnSplitJson2(@json, NULL)

Results

The columns from both versions are combined. Notice that the values in the SQL 2000 value column have rounded the milliseconds.

name	SQL 2000 value (S2K)	SQL 2008 value (S2K8)
0	2008-11-25 02:00:00.000	2008-11-25 02:00:00.000
1	2008-11-25 02:00:00.000	2008-11-25 02:00:00.001
2	2008-11-25 02:00:00.003	2008-11-25 02:00:00.002
3	2008-11-25 02:00:00.003	2008-11-25 02:00:00.003
4	2008-11-25 02:00:00.003	2008-11-25 02:00:00.004
5	2008-11-25 02:00:00.007	2008-11-25 02:00:00.005
6	2008-11-25 02:00:00.007	2008-11-25 02:00:00.006
7	2008-11-25 02:00:00.007	2008-11-25 02:00:00.007
8	2008-11-25 02:00:00.007	2008-11-25 02:00:00.008
9	2008-11-25 02:00:00.010	2008-11-25 02:00:00.009
10	2008-11-25 02:00:00.010	2008-11-25 02:00:00.010
11	2008-11-25 02:00:00.010	2008-11-25 02:00:00.011
12	2008-11-25 02:00:00.013	2008-11-25 02:00:00.012
13	2008-11-25 02:00:00.013	2008-11-25 02:00:00.013
14	2008-11-25 02:00:00.013	2008-11-25 02:00:00.014
15	2008-11-25 02:00:00.017	2008-11-25 02:00:00.015
16	2008-11-25 02:00:00.017	2008-11-25 02:00:00.016
17	2008-11-25 02:00:00.017	2008-11-25 02:00:00.017
18	2008-11-25 02:00:00.017	2008-11-25 02:00:00.018
19	2008-11-25 02:00:00.020	2008-11-25 02:00:00.019

Saturday, September 19, 2009

How can you find out if your version of Vista is 32 or 64 bits?

When you install windows 7 on your old computer, you need to find out your current OS running on 32 or 64 bits. Here is what you need to do to find out:

Hold down the left Windows key and press Pause/Break. It is on the screen
under the System section and says whether you are running a 32 bit or 64 bit
operating system.

Friday, September 18, 2009

How to Get a Free Room in Las Vegas

A friend of mine told me how to get a free room in Las Vegas. Here are the Steps:

Step1: Before Go to Las Vegas

Learn basic strategy or how to play break-even blackjack.
Play only blackjack, and play in a casino with good blackjack rules.

Step 2: In the Casino

Go to the promotions booth.
Fill out and turn in an application for a player's card.
Pick up the card.

Step 3: Play

Present your card to the pit boss for rating every time you sit down at a blackjack table.
Play as long as possible.
Wager as little as possible. Play perfect, basic strategy at all times.
Take as many restroom breaks as possible. Spill your drink on the table at least once, and take a long time deciding how to play each hand.
Bet at least $50 whenever the pit boss is watching, even if your average bet is only $5.
Request to meet a casino host after 3 or 4 hours of play. Be friendly when the host comes to the table.
Request a free room.

Tuesday, September 15, 2009

School Bus in China vs School Bus in India

School Bus in China...

School Bus in India...

Sunday, September 13, 2009

"best practices" checklist for Stored Procedure

When developing stored procedures, there seems to be a lot of emphasis on "get it done fast." Which means type all lower case, pay little attention to formatting, and sometimes throw best practices out the window. Personally, I would rather front-load my development time; I think that the costs I pay in initial development far outweigh what I might have paid in maintenance down the road. Making readable and maintainable code that also performs well and is delivered in a timely manner is something that a lot of us strive for, but we don't always have the luxury. But I have found that it is very easy to fall into the good kind of development habits.

A popular adage is, "you can have it fast, cheap, or good. Pick two." I contend that if you develop habits like these and use them in all of your database programming, the time difference between following those methods and doing it the "lazy" way will be negligible at most; and so, fast and good go hand in hand, rather than trade off for one another.

Once in a while this "disorder" slows me down. I come across code that someone else wrote (almost exclusively it is someone I no longer work with), and I can't even bear to look at it without first re-writing it. Here is a fake but realistic example of the kinds of procedures I see:

create proc foo(@i int,@bar int=null,@hr int output,@xd datetime) as
declare @c varchar
declare @s nchar(2)
declare @x int
set @grok='Beverly'
set @korg='MA'
set @x=5
select customers.customerid,firstname,lastname,orderdate from customers join orders on
customers.customerid=orders.customerid where status=@i or status<=@bar and orderdate<=@xd
set @hr = @@rowcount
select customers.customerid,count(*) from customers left join orders on 
customers.customerid=orders.customerid where customers.city=@c and customers.state=@s 
group by customers.customerid having count(*)>=@x
return (@@rowcount)

This kind of feels like the 5th grade all over again, but when I get handed code like this, I start immediately visualizing one of those "find all of the things wrong with this picture" exercises, and feel compelled to fix them all. So, what is wrong with the above sample, you may ask? Well, let me go through my own personal (and quite subjective) subconscious checklist of best practices when I write my own stored procedures. I have never tried to list these all at once, so I may be all over the place, but hopefully I will justify why I choose to have these items on my checklist in the first place.

======================

Upper casing T-SQL keywords and built-in functions

I always use CREATE PROCEDURE and not create procedure or Create Procedure. Same goes for all of the code throughout my objects... you will always see SELECT, FROM, WHERE and not select, from, where. I just find if much more readable when all of the keywords are capitalized. It's not that hard for me to hold down the shift key while typing these words, and there are even IDEs that will do this kind of replacement for you (for example, Apex SQLEdit has a handy "mis-spelled keyword replacement" feature that I think could be used for this purpose also). This is probably one of the few areas where Celko and I actually agree. :-)

======================

Using a proper and consistent naming scheme

Obviously "foo" is a horribly ridiculous name for a procedure, but I have come across many that were equally nondescript. I like to name my objects using {target}_{verb}. So for example, if I have a Customers table, I would have procedures such as:

dbo.Customer_Create
dbo.Customer_Update
dbo.Customer_Delete
dbo.Customer_GetList
dbo.Customer_GetDetails

This allows them to sort nicely in Object Explorer / Object Explorer Details, and also narrows down my search quickly in an IntelliSense (or SQLPrompt) auto- complete list. If I have a stored procedures named in the style dbo.GetCustomerList, they get mixed up in the list with dbo.GetClientList and dbo.GetCreditList. You could argue that maybe these should be organized by schema, but in spite of all the buzz, I have not developed a need or desire to use schemas in this way. For most of the applications I develop, ownership/schema is pretty simple and doesn't need to be made more complex.

Of course I NEVER name stored procedures using the sp_ prefix. See Brian Moran's article in SQL Server Magazine back in 2001. Or just ask anybody. :-) I also avoid other identifying object prefixes (like usp_). I don't know that I've ever been in a situation where I couldn't tell that some object was a procedure, or a function, or a table, and where the name really would have helped me all that much. This is especially true for the silly (but common) "tbl" prefix on tables. I don't want to get into that here, but I've always scratched my head at that one. Views may be the only place where I think this is justified, but then it should be a v or View_ prefix on the views only; no need to also identify tables... if it doesn't have a v or View_ prefix, it's a table!

More important than coming up with a proper naming scheme (because that is mostly subjective), it is much more important that you apply your naming scheme consistently. Nobody wants to see procedures named dbo.Customer_Create, dbo.Update_Customer and dbo.GetCustomerDetails.

======================

Using the schema prefix

I always specify the schema prefix when creating stored procedures. This way I know that it will be dbo.procedure_name no matter who I am logged in as when I create it. Similarly, my code always has the schema prefix on all object references. This prevents the database engine from checking for an object under my schema first.

======================

Using parentheses around parameter list

I am not a big fan of using parentheses around the parameter list. I can't really explain it, as I am a proponent of consistency, and this is the syntax required when creating user-defined functions. But I wanted to mention it because you will not see any of my stored procedures using this syntax. I'm open to change if you can suggest a good enough reason for me to do so.

======================

Lining up parameter names, data types, and default values

I find this much easier to read:

CREATE PROCEDURE dbo.User_Update
   @CustomerID     INT,
   @FirstName      VARCHAR(32)     = NULL,
   @LastName       VARCHAR(32)     = NULL,
   @Password       VARCHAR(16)     = NULL,
   @EmailAddress   VARCHAR(320)    = NULL,
   @Active         BIT             = 1,
   @LastLogin      SMALLDATETIME   = NULL
AS
BEGIN
...

...than this:

CREATE PROCEDURE dbo.User_Update
@CustomerID INT,
@FirstName VARCHAR(32) = NULL,
@LastName VARCHAR(32) = NULL,
@Password VARCHAR(16) = NULL,
@EmailAddress VARCHAR(320) = NULL,
@Active BIT = 1,
@LastLogin SMALLDATETIME = NULL
AS
BEGIN
...

======================

Using spaces and line breaks liberally

This is a simple one, but in all comparison operators I like to see spaces between column/variable and operator. So instead of @foo int=null or where @foo>1 I would rather see @foo INT = NULL or WHERE @foo > 1.

I also tend to place at least a carriage return between individual statements, especially in stored procedures where many statements spill over multiple lines.

Both of these are just about readability, nothing more. While in some interpreted languages like JavaScript, size is king, and compressing / obfuscating code to make it as small as possible does provide some benefit, in T- SQL you would be hard-pressed to find a case where this comes into play. So, I lean to the side of readability.

======================

Avoiding data type / function prefixes on column / parameter names

I often see prefixes like @iCustomerID, @prmInputParameter, @varLocalVariable, @strStringVariable. I realize why people do it, I just think it muddies things up. It also makes it much harder to change the data type of a column when not only do you have to change all the variable/parameter declarations but you also have to change @iVarName to @bigintVarName, etc. Otherwise the purpose of the prefixed variable name loses most of its benefit. So, just name the variable for what it is. If you have a column EmailAddress VARCHAR(320), then make your variable/parameter declaration @EmailAddress VARCHAR(320). No need to use @strEmailAddress ... if you need to find out the data type, just go to the declaration line!

======================

Using lengths on parameters, even when optional

I occasionally see people define parameters and local variables as char or varchar, without specifying a length. This is very dangerous, as in many situations you will get silent truncation at 30 characters, and in a few obscure ones, you will get silent truncation at 1 character. This can mean data loss, which is not very good at all. I have asked that this silent truncation at least become consistent throughout the product (see Connect #267605), but nothing has happened yet. Fellow MVP Erland Sommarskog has gone so far as to ask for the length declaration to become mandatory (see Connect #244395) and, failing that, feels that this should be something that raises a warning when using his proposed SET STRICT_CHECKS ON setting (see http://www.sommarskog.se/strict_checks.html#nodefaultlength).

======================

Listing output parameters last

My habit is to list OUTPUT parameters last. I am not sure why that is exactly, except that it is the order that I conceptually think about the parameters... in then out, not the other way around.

======================

Using BEGIN / END liberally

I have seen many people write stuff like this:

CREATE PROCEDURE dbo.ProcedureA
AS
   SELECT * FROM foo;
   GO
   SELECT * FROM bar;
GO

They create the procedure, maybe don't notice the extra resultset from bar (or shrug it off), and then wonder why they only get results from foo when they run the procedure. If they had done this:

CREATE PROCEDURE dbo.ProcedureA
AS
BEGIN
   SELECT * FROM foo;
   GO
   SELECT * FROM bar;
END
GO

Because GO is not a T-SQL keyword but rather a batch separator for tools like Query Analyzer and SSMS, they would have received these error messages, one from each batch:

Msg 102, Level 15, State 1, Procedure ProcedureA, Line 4
Incorrect syntax near ';'.
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'END'.

Yes, errors are bad, and all that, but I would rather have this brought to my face when I try to compile the procedure, then later on when the first user tries to call it.

======================

Using statement terminators

I have quickly adapted to the habit of ending all statements with proper statement terminators (;). This was always a habit in languages like JavaScript (where it is optional) and C# (where it is not). But as T-SQL gets more and more extensions (e.g. CTEs) that require it, I see it becoming a requirement eventually. Maybe I won't even be working with SQL Server by the time that happens, but if I am, I'll be ready. It's one extra keystroke and guarantees that my code will be forward-compatible.

======================

Using SET NOCOUNT ON

I always add SET NOCOUNT ON; as the very first line of the procedure (after BEGIN of course). This prevents DONE_IN_PROC messages from needlessly being sent back to the client after every row-affecting statement, which increases network traffic and in many cases can fool applications into believing there is an additional recordset available for consumption.

NOTE

wrote about it here

======================

Using local variables

When possible, I always use a single DECLARE statement to initialize all of my local variables. Similarly, I try to use a single SELECT to apply values to those variables that are being used like local constants. I see code like this:

declare @foo int
declare @bar int
declare @x int
set @foo = 5
set @bar = 6
set @x = -1

And then some more declare and set statements later on in the code. I find it much harder to track down variables in longer and more complex procedures when the declaration and/or assignments can happen anywhere... I would much rather have as much of this as possible occurring in the beginning of the code. So for the above I would rather see:

DECLARE 
   @foo    INT,
   @bar    INT,
   @x      INT;

SELECT
   @foo    = 5,
   @bar    = 6,
   @x      = -1;

As a bonus, in SQL Server 2008, the syntax now supports changing the above into a single statement:

DECLARE
   @foo    INT = 5,
   @bar    INT = 6,
   @x      INT = -1;

So much nicer. However, it still leaves a lot to be desired: I also always use meaningful variables names, rather than @i, @x, etc.

Also, some people like listing the commas at the beginning of each new line, e.g.:

DECLARE
   @foo    INT = 5
   ,@bar   INT = 6
   ,@x     INT = -1;

Not just in variable declarations, but also in parameter lists, columns lists, etc. While I will agree that this makes it easier to comment out individual lines in single steps, I find the readability suffers greatly.

======================

Using table aliases

I use aliases a lot. Nobody wants to read (never mind type) this, even though I have seen *many* examples of it posted to the public SQL Server newsgroups:

SELECT 
   dbo.table_X_with_long_name.column1,
   dbo.table_X_with_long_name.column2,
   dbo.table_X_with_long_name.column3,
   dbo.table_X_with_long_name.column4,
   dbo.table_X_with_long_name.column5,
   dbo.table_H_with_long_name.column1,
   dbo.table_H_with_long_name.column2,
   dbo.table_H_with_long_name.column3,
   dbo.table_H_with_long_name.column4
FROM
   dbo.table_X_with_long_name
INNER JOIN
   dbo.table_H_with_long_name
ON
   dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1
   OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1
   OR dbo.table_X_with_long_name.column1 = dbo.table_H_with_long_name.column1
WHERE
   dbo.table_X_with_long_name.column1 >= 5
   AND dbo.table_X_with_long_name.column1 < 10;

But as long as you alias sensibly, you can make this a much more readable query:

SELECT 
   X.column1,
   X.column2,
   X.column3,
   X.column4,
   X.column5,
   H.column1,
   H.column2,
   H.column3,
   H.column4
FROM
   dbo.table_X_with_long_name AS X
INNER JOIN
   dbo.table_H_with_long_name AS H
ON
   X.column1 = H.column1
   OR X.column2 = H.column2
   OR X.column3 = H.column3
WHERE
   X.column1 >= 5
   AND X.column1 < 10;

The "AS" when aliasing tables is optional; I have been trying very hard to make myself use it (only because the standard defines it that way). When writing multi-table queries, I don't give tables meaningless shorthand like a, b, c or t1, t2, t3. This might fly for simple queries, but if the query becomes more complex, you will regret it when you have to go back and edit it.

======================

Using column aliases

I buck against the trend here. A lot of people prefer to alias expressions / columns using this syntax:

SELECT [column expression] AS alias

I much prefer:

SELECT alias = [column expression]

The reason is that all of my column names are listed down the left hand side of the column list, instead of being at the end. It is much easier to scan column names when they are vertically aligned.

In addition, I always use column aliases for expressions, even if right now I don't need to reference the column by an alias. This prevents me from having to deal with multiple errors should I ever need to move the query into a subquery, or cte, or derived table, etc.

======================

Using consistent formatting

I am very fussy (some co-workers use a different word) about formatting. I like my queries to be consistently readable and laid out in a predictable way. So for a join that includes a CTE and a subquery, this is how it would look:

   WITH cte AS 
   (
       SELECT 
           t.col1,
           t.col2,
           t.col3
       FROM
           dbo.sometable AS t
   )
   SELECT
       cte.col1,
       cte.col2,
       cte.col3,
       c.col4
   FROM
       cte
   INNER JOIN
       dbo.Customers AS c
       ON c.CustomerID = cte.col1
   WHERE EXISTS
   (
       SELECT 1
           FROM dbo.Orders o
           WHERE o.CustomerID = o.CustomerID
   )
   AND c.Status = 'LIVE';

Keeping all of the columns in a nice vertical line, and visually separating each table in the join and each where clause. Inside a subquery or derived table, I am less strict about the visual separation, though I still put each fundamental portion on its own line. And I always use SELECT 1 in this type of EXISTS() clause, instead of SELECT * or SELECT COUNT(*), to make it immediately clear to others that the query inside does NOT retrieve data.

======================

Matching case of underlying objects / columns

I always try to match the case of the underlying object, as I can never be too certain that my application will always be on a case-sensitive collation. Going back and correcting the case throughout all of my modules will be a royal pain, at best. This is much easier if you are using SQL Server 2008 Management Studio against a SQL Server 2008 instance, or have invested in Red-Gate's SQL Prompt, as you will automatically get the correct case when selecting from the auto-complete list.

======================

Qualifying column names with table/alias prefix

I always qualify column names when there is more than one table in the query. Heck, sometimes I even use aliases when there is only one table in the query, to ease my maintenance later should the query become more complex. I won't harp on this too much, as fellow MVP Alex Kuznetsov treated this subject a few days ago.

======================

Using RETURN and OUTPUT appropriately

I never use RETURN to provide any data back to the client (e.g. the SCOPE_IDENTITY() value or @@ROWCOUNT). This should be used exclusively for returning stored procedure status, such as ERROR_NUMBER() / @@ERROR. If you need to return data to the caller, use a resultset or an OUTPUT parameter.

======================

Avoiding keyword shorthands

I always use full keywords as opposed to their shorthand equivalents. "BEGIN TRAN" and "CREATE PROC" might save me a few keystrokes, and I'm sure the shorthand equivalents are here to stay, but something just doesn't feel right abou tit. Same with the parameters for built-in functions like DATEDIFF(), DATEADD() and DATEPART(). Why use WK or DW when you can use WEEK or WEEKDAY? (I also never understood why WEEKDAY become DW in shorthand, instead of WD, which is not supported. DW likely means DAYOFWEEK but that is an ODBC function and not supported directly in T-SQL at all. That in and of itself convinced me that it is better to take the expensive hit of typing five extra characters to be explicit and clear.) Finally, I always explicitly say "INNER JOIN or "LEFT OUTER JOIN"... never just "join" or "left join." Again, no real good reason behind that, just habit.

======================

Using parentheses liberally around AND / OR blocks

I always group my clauses when mixing AND and OR. Leaving it up to the optimizer to determine what "x=5 AND y = 4 OR b = 3" really means is not my cup of tea. I wrote a very short article about this a few years ago.

======================

So, after all of that, given the procedure I listed at the start of the article, what would I end up with? Assuming I am using SQL Server 2008, and that I can update the calling application to use the right procedure name, to use sensible input parameter names, and to stop using return values instead of output parameters:

CREATE PROCEDURE dbo.Customer_GetOlderOrders
   @OrderStatus        INT,
   @MaxOrderStatus     INT = NULL,
   @OrderDate          SMALLDATETIME,
   @RC1                INT OUTPUT,
   @RC2                INT OUTPUT
AS
BEGIN
   SET NOCOUNT ON;

   DECLARE
       @City           VARCHAR(32) = 'Beverly',
       @State          CHAR (2)    = 'MA',
       @MinOrderCount  INT         = 5;

   SELECT
       c.CustomerID,
       c.FirstName,
       c.LastName,
       c.OrderDate
   FROM
       dbo.Customers c
   INNER JOIN
       dbo.Orders o
       ON c.CustomerID = o.CustomerID
   WHERE   
       (
           o.OrderStatus       = @OrderStatus
           OR o.OrderStatus    <= @MaxOrderStatus
       )
       AND o.OrderDate         <= @MaxOrderDate;

   SET @RC1 = @@ROWCOUNT;

   SELECT
       c.CustomerID,
       OrderCount = COUNT(*)
   FROM
       dbo.Customers c
   LEFT OUTER JOIN
       dbo.Orders o
       ON c.CustomerID = o.CustomerID
   WHERE
       c.City = @City
       AND c.State = @State
   GROUP BY
       c.CustomerID
   HAVING
       COUNT(*) >= @MinOrderCount;

   SET @RC2 = @@ROWCOUNT;

   RETURN;
END
GO

Okay, so it LOOKS like a lot more code, because the layout is more vertical. But you tell me. Copy both procedures to SSMS or Query Analyzer, and which one is easier to read / understand? And is it worth the three minutes it took me to convert the original query? It took me a few hours to convert this list from my subconscious to you, so hopefully I have helped you pick up at least one good habit. And if you think any of these are BAD habits, please drop a line and let me know why!