Dealing with Passwords
Tags: Cryptography, Programming, Security
After the recent leaks of password hashes from LinkedIn and others, I thought it would be a good idea to write down some 'best practices' in how to properly deal with user passwords and sensitive data. This entry is by no means complete, nor is it the be all, end all there is to say about the topic. What it does try to do is give a decent starting point to eliminate basic mistakes which could lead to embarrasment later one. If you're developing a new website, or bringing another one up to date or are otherwise working with users and passwords, these tips might be of help. Let's start...
First and foremost: do not put artificial limitations on your passwords. If a user decides he or she wants a password 20 characters long, built up of alphanumeric characters with punctuation marks and alien characters - let them. Make sure your site supports unicode so that non-ascii characters from other languages can be used. Passwords containing foreign characters such as Arabic, Chinese or Cyrillic will increase the password strength. Some limits you might consider to implement are the exact opposite: require a certain password length, mixture of upper and lower case and numbers. These measures will make attacks much harder and it increases password entropy.
Whenever users need a password reset and the reset password is sent to them by email, put a time limit on the validity of the password sent by email, limit them to only be used once, and make sure you force the user to change the password as part of the reset procedure. Emails are sent in plain text over the Internet, and are very easy to intercept. Not to mention, someone who might gain access to your email account suddenly has access to a potential range of passwords.
This following one almost goes without saying, but I'll say it anyway: never ever store passwords in plain text, always hash them. Most web programming languages support hash functions out of the box, but you should still be careful since not every algorithm is the same. For example, MD5 has long been used as the default hashing algorithm. By now, MD5 should not be used anymore for this task. Same with SHA1 actually. So what algorithm can we use?
For hashing passwords, you want an algorithm that is slow. This might seem contradictory, but the idea here is to make sure that a potential attack on the password hashes takes as much time as possible. This means that the attacker gets slowed down in its quest to brute force the hashes and needs a lot more computing power to make the attack feasible. Ideally, you would want to use the scrypt() key derivation function. You can read more about that one here: http://www.tarsnap.com/scrypt.html. Also, bcrypt() is a good one.
Alternatively, common web programming languages provide functions to get decent hashes, such as crypt() in PHP. Make sure though that you properly read the manual that comes with the functions in question, as the default behaviour is not necessarily the best! Taking the PHP function as an example: the default returns a hash based on the DES algorithm, and should not be used anymore at this time. Instead, opt at least for the SHA256 or better Blowfish or the SHA512 option, with 5000 rounds. Here, 'rounds' determines how many times the hashing loop should be executed. This will help to slow down the algorithm.
One of the main problems with the recently leaked password hashes is that the passwords themselves were not salted. Salting a password means that the password hash that is generated and stored for user authentication is not just generated by taking the password, but the password with 'something else'. This 'something else' should be random and unique per user. The random part can be the registration date, a randomly generated number, or for example a random string of characters generated with a random password generator. This way, even if the user chooses a password as simple as "12345" the hash will be that of e.g., "username+12345+randomstringhere".
Here, both the username and some random string are used to salt the password, and both can be stored in plain text in the database. The goal of salting passwords is to make a dictionary atack based on lookup tables (so called rainbow tables) impractical, since generating those tables for every known random string (password+salt) would simply a) take way too long and b) take up way too much storage space. Keep in mind that one can still perform a dictionary attack, but that this would be slowed down tremendously since pre-computed hashes cannot be used.
Some final remarks. You can't really 'crack' a password hash. The two possible ways to find a password from the hash is to either find collisions, which indicates a weakness in the actual algorithm, or to recreate the hash from a range of inputs. The first method is what security researchers do to when analyzing hash algorithms for weaknesses. This is one of the reasons MD5 is not secure anymore. The second method requires lots of processing power, and can be substantially made harder by choosing good passwords, using salts, and using slow algorithms (as mentioned before, see scrypt()).
It is impossible to predict all the possible attacks and threats to your system, so it is up to you to make sure you make it a best effort by doing the research. Keep in mind that should you be negligible in this area, you might be held liable for any damages - so you might as well begin to act paranoid from the beginning. Too often, security is an afterthought while it should be part of the design from the start. (you'd be surprised how people don't learn, and how companies consider security nothing more than a cost, but I digress...)
Lastly, if you really don't know what you're doing and still want to get your latest killer idea out there, contact a professional cryptographer or experienced security person to give you the information you need, the material and pointers to read, and perhaps even the contract to do the implementation right.