How secure hashes work

A hash function is a function that takes an arbitrary amount of data and reduces it to a chunk of data of a fixed length (the hash). The count of bytes in a file can be thought of as a very simple hashing function, or to add together the number of a's in a file to twice the number of b's, 3 times the number of c's, etc. They are useful if you're not certain a file was transmitted correctly across a network: if a letter was dropped accidently then the file won't match the hash.

Cryptographic hashes (sometimes called secure hashes or one-way hashes) are special functions that have 3 important properties:

  • it is essentially impossible to construct a file that has a given hash,
  • it is essentially impossible to modify a given file without changing its hash,
  • it is essentially impossible that two different files will have the same hash.
Benefits

There are 3 major benefits to the secure hashes that we use, such as SHA-2

1. A hash tells you nothing about the file it came from.

Some hashes contain clues about the length of the input file, but otherwise it's essentially random noise.

2. If two files are similiar the hashes have nothing in common.

For example the SHA2 hash for "This is a text." is

f0c9b9b5 8169c524 a45c7a9a fd7140ce 86a8634e 6e7bc2c7 2a401f97 4cbf5bf8

while the the SHA2 hash for "This is A text." is

8ac5adf0 296c09c8 ed82bb7f 67a645fe a694dd6f ce6fd3d7 97180846 62dbdb85

Similiarly: if you could find two similiar hashes, that wouldn't say anything about the source files.

3. If two files generate the same hash, they must be the same file

Given a hash, there is no way to build a file that will generate the same hash better than trial and error. For the SHA2 hash, it would take an average of 57,896,044,618,658,097,711,785,492,504,343,953,926,634,992,332,820,282,019,728,792,003,956,564,819,968 tries. Which is to say if every person on earth tried a million files a second, it would still take 262,093,447,154,870,433,703,503,188,873,629,081,464,828,682,402,716,927 years. This is the kind of thing we mean by essentially impossible.

Is it magic?

No. It's brilliant mathematics.

The algorithms we use weren't built by us, or even for us. SHA2 was designed by the National Security Agency of the United States (it's described in US Patent 6,829,355) as part of the NSA's mission "to protect U.S. national security systems and to produce foreign signals intelligence information."

Try it out

The following script can generate SHA2 hashes:

Source data:

As an example, try to determine what data generated this hash:

dc2447ec b0544729 e796f4bd ba6513b9 7cc829ab 1e345602 40a961d8 23b2af47

How it works: design, one-way hashes, clients