Tags: Cryptography, Security
So LinkedIn had some security issues a couple of days ago: 6 million or so password hashes from their users were leaked on a Russian hacker site. There seems to be quite some confusion among people as to what the impact of this really is, with several websites claiming that the actual passwords were leaked, that the passwords can be 'decrypted' etc. Let's put some of these things straight, starting with some of the terminology.
A hash is basically a unique identifier for a particular file or text string, often of a fixed length. One can use a hash to verify the integrity of a downloaded file for instance:
- I have a big file I want to download, say 4GB or so
- The person on the other end hosting the file generates a hash from this file, resulting in something like 'f572d396fae9206628714fb2ce00f72e94f2258f'
- I download the file and generate the hash of the file I downloaded.
- If the hash I generate differs from the one the other side published, I know there was a transmission error and the file is corrupt.
Note that a hash function is a one way function. It is not possible to recreate the 4GB file with just the string 'f572d396fae9206628714fb2ce00f72e94f2258f'. If you could, you'd have just created the world's best compression algorithm and you could be very rich in no time!
Same thing with passwords: instead of storing actual passwords, we store hashes of the actual password (preferably salted, we'll come back to that). So instead of storing a password 'thisismysupersecret', we store '96f7ef476f3a68bdb4dd4b1a2ffe7537365aa603', the (in this case) sha1 hash of the password. To log users in, we generate a hash of the password field the user fills in, compare it to the hash in the database, and if they match we log the user in.
The problem now is that with weak passwords, we can attempt a brute force attack on the hash. As I said, the hash function is a one way function, so I cannot 'decrypt' it. What I can do however is generate hashes for the entire english dictionary, combinations thereof, common used words, etc. to try and find the actual password in this way. Even more, there are tables available of pre-generated hashes for these combinations (called rainbow tables) which can make the whole process easier.
To discourage this, we can add a salt to the stored hash. Instead of just taking the hash of the password, we use the hash of the password together with a unique identifier, for example the date the user created the account, or the user name, their date of birth, or a combination of multiple. This way, it becomes much harder to find the password in a brute force manner and since it cannot just match standard dictionary words, etc.
By salting, even if two users have the same password, say something short and silly like 'password', their hash would not be the same and thus potentially identifying weak passwords (e.g., due to duplicate hashes) cannot be done. Furthermore, rainbow tables become useless since the hash is not built up with potentially standard dictionary words.
Out of the 6 million hashes that were leaked, around 300.000 passwords have been extracted last time I checked. I've personally done a couple of brute force efforts on the hashes with some common dictionaries to get an idea of the overall password strength. For example, the entire English word list using John the Ripper got exactly 1 hash in common. Using other dictionaries and a few other methods, I managed to get 1000 or so in total, and I'm sure that if I would put more time in it I could get quite a lot more (perhaps something for a rainy Sunday afternoon). It seems however, that the password choices were pretty decent in general (especially when compared to the list of hashes form MilitarySingles.com a while ago).
On the other hand, the fact that all of the leaked hashes appear to be of relatively strong passwords might indicate that the initial amount of hashes was much larger, but that only those hashes of the difficult passwords were released in the open in the hope of getting assistance cracking them.
The biggest issue with the leak is that the LinkedIn passwords were not salted. This in itself means that in case of trivial passwords, the hashes can be found easily, especially with the help of rainbow tables. Since salting is relatively trivial to implement, I hope that LinkedIn and others will at least take this step in the near future.
Oh, and do change your LinkedIn password :-)
EDIT: I couldn't stop myself from giving it another try over the weekend. It seems that a combination of using incremental mode in John the Ripper, a dictionary of words in various languages, and using the discovered passwords as seed words for another round of attack, etc. we yield a much better result. I managed to get 2.5 million passwords after about a night of computing.