Disclaimer

Any opinions expressed here are my own and not necessarily those of my employer (I'm self-employed).

Jan 11, 2012

How not to hash passwords in .NET

In connection with a bug in TransformTool, I've been looking into how text encoding is handled in the .NET framework. Turns out there are some caveats that can affect the correctness of a program, and when used in e.g. password validation they might turn out to be severe security issues.

This post assumes you are somewhat familiar with how character encodings work. You might want to check out my Introduction to character encoding if you're not. I wrote it mainly because I didn't want to explain the basics of encodings in this post.

The encoding issues/features I discuss here are all well documented in the article Character Encoding in the .NET Framework, but I believe that the issues aren't that well known. Stack overflow, blogs, and discussion forums are riddled with insecure code samples. Do a Google search for "ASCII.GetBytes" password, and you'll get a lot of results. I even found insecure code examples in a text book, the C# 2008 Programmer's Reference (page 344). So I definitely believe we need to raise awareness of these issues in the .NET community.

Encoding subtleties
In the MSDN article on character encoding you'll find that the first suggestion on how to use the encoding objects in .NET is: 
Use the static properties of the Encoding class, which return objects that represent the standard character encodings available in the .NET Framework (ASCII, UTF-7, UTF-8, UTF-16, and UTF-32). For example, the Encoding.Unicode property returns a UnicodeEncoding object. Each object uses replacement fallback to handle strings that it cannot encode and bytes that it cannot decode.
And people love to use the static properties! But if you don't read this carefully and pause with the "replacement fallback", you might get into trouble. "Replacement fallback" means that every character that cannot be encoded to bytes will be replaced with the "?" character silently. But what does that mean? Time for a demo using the ASCII encoding:




Oh my. What just happened here?
  1. First you see different strings I've chosen to demonstrate how "default" encoding works. Some strings contain characters that do not exist in the ASCII encoding.
  2. Next you see the output from System.Text.Encoding.ASCII.GetBytes(...), the strings have become bytes shown as hex. Hint: look at how the last three strings' bytes suddenly are identical after running them through GetBytes()!
  3. Just to underscore that hashing does not help here (identical bytes give identical hash values). Out of our five different input strings, three have the same hash value! 
  4. System.Text.Encoding.ASCII..GetString(...), to turn our byte tables back into String objects. If you compare the input strings and output strings, only one of them is unchanged.
If you read the documentation for e.g. the static ASCII encoding property, the results aren't very surprising: 
... might not have the appropriate behavior for your application. It uses replacement fallback to replace each string that it cannot encode and each byte that it cannot decode with a question mark ("?") character.
Still, this is often how people recommend you hash passwords. Read the docs people!

The MSDN article on character encodings is really good, so go read it. I've included the source code for the demo at the end of the post in case you want to try it yourself. I've also added a commented out Exception fallback example, try that out too.

This was bad. Now what?
Well, first I'll point out the implications here. If all your user's were to use ASCII characters only, this wouldn't be a problem. Ironically, as you add non-ASCII characters to passwords — supposedly making them "more secure" — you make them less secure since the non-ASCII characters become "wildcards." All you guys who are native English speakers have to keep in mind that there are more of us who aren't, so use UTF-8 in your code samples instead of ASCII. Then we can still use our classic Norwegian trick to make passwords "uncrackable" by adding a Norwegian letter to it: æ/ø/å.

So, while I've pointed out what could be a potentially serious security issue, it's probably not the end of the world. But you should move away from the ASCII-encoding if that's what you're using for your password hashing.

Building an authentication system based on passwords is not as straight forward as many might think, here's a couple of important challenges:
  • You need to avoid the problem I've outlined in this post, so you don't mess up the password before you're even started (it can be solved by using a Unicode encoding with the Exception Fallback).
  • You need to salt the passwords to avoid the problem of rainbow tables
  • You need to decide on how you want to compute the value you store in the database, should you use a plain SHA-256 transform, the PBKDF2 algoritm, bcrypt, or maybe scrypt? That decision directly affects the effectiveness of brute-force and dictionary attacks.
Fortunately, you don't have to solve these problems yourself, you can rely on others. There's for example a .NET membership provider that takes care of all of this, the SqlMembershipProvider. It uses the Unicode encoding to handle the passwords, and it also uses a unique 128-bit salt for each password. It used SHA-1 up til .NET 3.5, but SHA-256 was made the default hash function in .NET 4. If you're not familiar with any of the challenges I've listed above, you should go with the membership provider. If you're not a .NET developer, see if you can find a renowned library for your platform instead of implementing this yourself.

You'll find a detailed article by Troy Hunt on the issues with password hashing  in his OWASP top ten for .Net developers. And here's a definite take away:
..when it comes to security, the more stuff you can pull straight out of the .NET framework and avoid rolling yourself, the better. There’s just too much scope for error and unless you’re really confident with what you’re doing and have strong reasons why the membership provider can’t do the job, stick with it.
Amen.

Anything else?
Well, yes. You might not always be making the decisions yourself for how encoding errors should be handled, so you need to keep an eye out for how others deal with these issues. As I've been writing this post there's been a relase of the AntiXSS library. One of the changes is that "Invalid Unicode no longer throws an exception", here's the details from the release notes:
Invalid Unicode characters are now replaced with the Unicode replacement character, U+FFFD (�). Previously, when encoding strings through HtmlEncode, HtmlAttributeEncode, XmlEncode, XmlAttributeEncode or CssEncode invalid Unicode characters would be detected and an exception thrown.
I'm not sure what the change is in lines of code, but I would guess that they emit the � explicitly after catching an error. The documentation is quite clear for the Unicode encodings:
To enable error detection and to make the class instance more secure, the application should use the UnicodeEncoding constructor that takes a throwOnInvalidBytes parameter, and set that parameter to true. With error detection, a method that detects an invalid sequence of characters or bytes throws a ArgumentException. Without error detection, no exception is thrown, and the invalid sequence is generally ignored.
I'll try to ping @blowdart and see if he'll write something about this on his blog. What it definitely does mean, is that if you output invalid unicode for some reason, it's probably only your users, and not you, who'll notice.  To detect errors you will have to search the output from the AntiXSS library for the �.

What the TransformTool bug looked like
Here's two screenshots from TransformTool, the first showing that non-ASCII characters are replaced with questionmarks.


And here's after my bugfix, using the Exception Fallback, where a System.Text.EncoderFallbackException is thrown:


The code
Here's the code for the console application. I've commented out the safe way to obtain an ASCII encoding. You can give the code a try to see the behaviour for yourself, switching between the safe and unsafe way of instantiating an ASCII encoding.

var exampleStrings = new String[] {
"abcde",
"abcdé",
"?????",
"ééééé",
"üüüüü"};

byte[][] ASCIIBytes = new byte[exampleStrings.Length][];

var ASCIIEncoding = System.Text.Encoding.ASCII;
//var ASCIIEncoding = System.Text.Encoding.GetEncoding("ASCII",
//    new EncoderExceptionFallback(),
//    new DecoderExceptionFallback());

Console.WriteLine("Strings to encode:");
Console.WriteLine("0: " + exampleStrings[0] + "      -> all ASCII chars");
Console.WriteLine("1: " + exampleStrings[1] + "      -> é is not a valid ASCII char");
Console.WriteLine("2: " + exampleStrings[2] + "      -> all questionmarks (valid ASCII)");
Console.WriteLine("3: " + exampleStrings[3] + "      -> all chars invalid ASCII");
Console.WriteLine("4: " + exampleStrings[4] + "      -> all chars invalid ASCII");
Console.WriteLine();

Console.WriteLine("Get bytes (ASCII encoding):");
int i = 0;
foreach (var s in exampleStrings)
{
    ASCIIBytes[i] = ASCIIEncoding.GetBytes(s);
    Console.WriteLine(i + ": " + BitConverter.ToString(ASCIIBytes[i]));
    i++;
}

Console.WriteLine();

Console.WriteLine("Let's pretend they are passwords and hash them with SHA-1!");
i = 0;
using (var sha = SHA1CryptoServiceProvider.Create())
{
    foreach (byte[] bytes in ASCIIBytes)
    {
        Console.WriteLine(i + ": " + BitConverter.ToString(sha.ComputeHash(b)));
        i++;
    }
}

Console.WriteLine();
Console.WriteLine("Uhm... Why are 2,3,4 identical?");
Console.WriteLine();

Console.WriteLine("Back to ASCII strings: ");
i = 0;
foreach (byte[] bytes in ASCIIBytes)
{
    Console.WriteLine(i++ + ": " + ASCIIEncoding.GetString(bytes));
}
Console.WriteLine();

Console.ReadLine();

1 comment:

  1. And here is an API for .NET to do it properly :)

    https://sourceforge.net/projects/pwdtknet/

    ReplyDelete

Copyright notice

© André N. Klingsheim and www.dotnetnoob.com, 2009-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to André N. Klingsheim and www.dotnetnoob.com with appropriate and specific direction to the original content.

Read other popular posts