Any opinions expressed here are my own and not necessarily those of my employer (I'm self-employed).

Jan 11, 2012

How not to hash passwords in .NET

In connection with a bug in TransformTool, I've been looking into how text encoding is handled in the .NET framework. Turns out there are some caveats that can affect the correctness of a program, and when used in e.g. password validation they might turn out to be severe security issues.

This post assumes you are somewhat familiar with how character encodings work. You might want to check out my Introduction to character encoding if you're not. I wrote it mainly because I didn't want to explain the basics of encodings in this post.

The encoding issues/features I discuss here are all well documented in the article Character Encoding in the .NET Framework, but I believe that the issues aren't that well known. Stack overflow, blogs, and discussion forums are riddled with insecure code samples. Do a Google search for "ASCII.GetBytes" password, and you'll get a lot of results. I even found insecure code examples in a text book, the C# 2008 Programmer's Reference (page 344). So I definitely believe we need to raise awareness of these issues in the .NET community.

Encoding subtleties
In the MSDN article on character encoding you'll find that the first suggestion on how to use the encoding objects in .NET is: 
Use the static properties of the Encoding class, which return objects that represent the standard character encodings available in the .NET Framework (ASCII, UTF-7, UTF-8, UTF-16, and UTF-32). For example, the Encoding.Unicode property returns a UnicodeEncoding object. Each object uses replacement fallback to handle strings that it cannot encode and bytes that it cannot decode.
And people love to use the static properties! But if you don't read this carefully and pause with the "replacement fallback", you might get into trouble. "Replacement fallback" means that every character that cannot be encoded to bytes will be replaced with the "?" character silently. But what does that mean? Time for a demo using the ASCII encoding:

Jan 8, 2012

Introduction to character encoding

Text encoding is a persistent source of pain and problems, especially when you need to communicate textual information across different systems. Every time you read or create an xml-file, a text file, a web page, or an e-mail, the text is encoded in some way. If the encoding is messed up along the way, the receiver will be looking at strange characters instead of the ori�inal t□xt. (ba-da-bing :)

I've been fighting with characters sets on several occasions throughout the years. Just recently, I had a bug in TransformTool related to character encoding and how errors are handled in the .NET framework. While writing about the bug I needed a reference to a basic introduction to character encoding — only to discover that most are very technically focused and dive right into the characters' hex codes. Here, I'll try to fill that gap and explain only the basics. I'll include pointers to more detailed resources in case you decide to dig deeper into the dark world of character encodings.

How encodings work
The Unicode Consortium has a great explanation of how it really works:
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one.

Copyright notice

© André N. Klingsheim and www.dotnetnoob.com, 2009-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to André N. Klingsheim and www.dotnetnoob.com with appropriate and specific direction to the original content.

Read other popular posts