Software and web application security

April 23, 2007

I18N input validation whitelist filter with System.Globalization and GetUnicodeCategory

Filed under: software security — chrisweber @ 10:48 pm

Maybe you’re building internationalized code and wondering how to build a whitelist filter that will support all the different character sets your planning to support. If you support more than ten, especially some of the larger east Asian sets, this might seem like an unwieldy or tricky process.
Well luckily it’s easier than most people would think. Building a good input validation filter can be simplified with .Net’s GetUnicodeCategory. But use the method from the System.Globalization namespace as the other one in System.Char looks like it may become the subordinate.

With GetUnicodeCategory you can simply build a whitelist supporting the character categories you want to allow. So get away from thinking you have to write a regEx filter and list out all the character ranges you want to allow in each character set, it’s much simpler than that!

The Unicode standard assigns ever character to one of about 31 categories. They make sense too, for example Other Control charactes (Cc) , Lowercase Letter (Ll), Uppercase Letter (Lu), Math Symbol (Sm). So for example you might want to only allow letters, numbers, and punctuation in your whitelist. This could be achieved with the following snippet:

char cUntrustedInput; // the untrusted user-input
UnicodeCategory cInputTest = CharUnicodeInfo.GetUnicodeCategory(cUntrustedInput);
if (cTestCategory == UnicodeCategory.LowercaseLetter ||
cTestCategory == UnicodeCategory.UppercaseLetter ||
cTestCategory == UnicodeCategory.DecimalDigitNumber ||
cTestCategory == UnicodeCategory.TitlecaseLetter ||
cTestCategory == UnicodeCategory.OtherLetter ||
cTestCategory == UnicodeCategory.NonSpacingMark ||
cTestCategory == UnicodeCategory.DashPunctuation ||
cTestCategory == UnicodeCategory.ConnectorPunctuation)
// character looks safe, continue
// character is not allowed, fail

Not too bad eh.



  1. Brilliant, nice to see how easy this can be. Great work man you rule!

    Comment by Brian — April 24, 2007 @ 9:53 am

  2. […] chrisweber Blogged about a good topic today on chrisweber.wordpress.comHere’s a summary…. […]

    Pingback by I18N input validation whitelist filter with System.Globalization and GetUnicodeCategory — October 12, 2007 @ 6:39 pm

  3. I would like to see a continuation of the topic

    Comment by Maximus — December 20, 2007 @ 1:47 am

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Blog at

%d bloggers like this: