Cryptocurrencies are losing value worldwide. The incredible growth rates of last year have turned into their opposite. This development may make it easier and more necessary to look at the structures of Blockchain technology. You should know what it’s all about. Last month I described which problems could be solved with the blockchain technology (Blockchain – Decentralised Byzantine Generals). This blog post is about the technical basis of this technology.

Do not worry, it will not be too technical 🙂

Hash functions

The verb to hash in general means to chop something into small pieces. This can be food, such as meat and potatoes, it can also be used to describe an arrangement of something.

A simple application of hash functions can be found in every bookstore!
The books are on shelves and are often arranged in alphabetical order according to subject and surname of the author. The surname order works with a hash algorithm!
Take the first letter of the last name and sort the book on the shelf. If two authors have the same name, you will need to use the first name to make it clearer.

Author Letter (hash value)
Dan Brown B
Sandra Brown B
Giulia Enders E
Daniel Kehlmann K

Everyone probably knows the system and everyone could sort the books. So this is nothing special?
It’s worth taking a closer look.

  • The name of the author is hashed (chopped, arranged)
  • The result is clear.
  • The hashing goes in one direction.
    It is quite easy to get the letter from the name, but it is impossible to get the name from the letter.

A more complex example are checksums of files. To know if the file you want to download from a website is really the original file, a hash function is used to determine a checksum. Offered on the website are the file and the checksum. After the download, the checksum itself can be determined and compared with the offered checksum.



Depending on which algorithm is used, this image – 650.jpg (image/jpeg) – 63148 bytes has the following checksums:

  • SHA-1: 700945496cd591b23d05c3fce1b2532427e2ca98
  • SHA-256: 548f8d41a849b532e2597f26a3530dac56b390eb06c603a15d041053ee17426f
  • SHA-384: 326f815e620c530f5ee23d83f59e70029de4b76472338fa63a36abbb75d58f04aa152c9a1082e2c8770c93928b0a454e
  • SHA-512: fecb10c72a4ea7e012a2c72650b733fbac8c6125184f733da8365a6e93749e14ceb5400617c9fa00606d60a05b035e7369b8e21a2e1ee7333331db175c311b9a

Although this looks very complicated, it is the same process as the book example. The checksum is determined on the basis of rules and is unique. You can determine the checksum after downloading and compare it with the one in the blog entry. It must match! Also in this example it is (at least for a computer) very easy to get from a file to the checksum. The way back is not possible. The books were all about the right sorting. In this case, the “authenticity” of a file can be determined.

The checksums in the example are getting longer and longer. Under certain circumstances, fast computers can calculate the return journey after a certain threshold, hence the rule of thumb: the more complex the algorithm and the resulting checksum, the safer the “one-way street”.

Checksums can be found in the browser without having to upload the file in question, for example here:

Verifying the checksum ensures that you have exactly the “original” file. If you calculate another checksum, something is different in your file. This is very handy for ZIP packages of software downloads. An “original” software package always leads to the same checksum.
The picture is the same, but  if it is edited with Photoshop e.g., the checksum is different.

Hash table

In a hash table you can now display the original file or the name (key), the used hash function and the calculated checksums (hashes).

Hash table

Hash table

Hashes are therefore in principle easy to calculate and quite practical. Now the question arises, what these hashes have to do with the blockchain and what can be done with it? The crucial feature of hashes is the uniqueness in one direction and the impossibility to determine from the hash the key (in the other direction). If we think of the Byzantine generals, the solution was, on one hand, an unspecified 10-minute work to be performed by each general and, on the other hand, the gathering of all messages. With the knowledge of hash functions, the solution could work more elegantly, namely with a “hash chain” or hash list.

  • General 1 signs and generates from the text (key) a checksum (hash)
  • General 2 signs, the checksum of General 1 depends on it and generates from the text (Key) a checksum (hash)
  • General 3 signs, the checksum of General 2 depends on it and generates from the text (Key) a hash (hash)
  • and so on

It is important that parts of the already created messages are included in the hashing of the new message.

Now everything suddenly makes sense, because in one direction everything is traceable and transparent.
Fraud is technically impossible!

The trees of Mr Merkle

Namesake Ralph Merkle is one of the pioneers of cryptosystems. In 1974, he discovered the previously considered impossible key exchange protocol “Merkles Puzzles” and thus justified the public-key cryptography.

For a signature process he developed hash trees (Merkle trees) in 1979. In addition to the signatures, the hash trees can also protect any other type of data from change. The hash trees have become popular in peer to peer networks such as the BitTorrent protocol. It’s about distributed file sharing. A large file (movie) is in parts on many peers. The requesting peer must find out which parts of the file it can load from which other peer. It then has to be checked whether the data is unchanged and undamaged. By using this technique, the individual parts of the file can be downloaded in parallel and in total at maximum speed, even if the individual peers that make the parts available are not connected to the Internet so quickly. The BitTorrent protocol is completely legal and still the fastest way to distribute files. Many Linux distributions and computer games are distributed by BitTorrent. Legal problems only arise if the files are protected by copyright!

The root of such a hash tree is called top, root or master hash. If you get this hash from a trusted source, the rest of the tree can also be obtained from untrusted sources.

But before your brain explodes, a small drawing. To understand the example, you should know the story of the Byzantine generals.

Merkle Tree

Merkle Tree

The generals write a message and create their hash. To “glue together” two messages, hash 1 and hash 2, as well as hash 3 and hash 4, generates hash A and hash B. From hash A and hash B then the top hash is created.
Anyone who owns a “branch of the tree”, that is Message 1 -> Hash 1 -> Hash A -> Top Hash, can check whether the numbers are correct and “whether the message is true”. The more branches you know, the greater the certainty.

This is all quite logical and the brain calms down again after a certain time 🙂

But new questions arise:

  1. How do I know the top hash?
  2. How can I find the hash values ​​in between?
  3. What is being checked?

Take the third question first. What is being checked? I have to determine if a message is part of a tree. If it is, it is true. Every tree has a top hash. If all messages have this top hash, they are true. But who gives me the top hash?
Suppose I want to know if the message from general 3 is true. I submit the message from general 3 to my algorithm and ask for the top hash. I get the value. Now I want to know if message 4 is true. This time I could form a hash by myself from message 3 and 4, but I still lack the knowledge about the common hash of message 1 and 2. So I ask again my algorithm for the top hash for message 4. If the answer is the same as with message 3, message 4 is also within that tree, therefore true.

The answer to question two is easy. I can do it with a hash function if I have the data and a program that is able to calculate the hash value.

And who delivers to me is now clear. It’s an algorithm as part of the distributed system that knows the rules.

Now we are right in the middle of blockchain technology and consensus algorithms.
But for today it should be enough.