Introduction to Merkle Tree
Merkle tree is a fundamental data structure that is used to for synchronization and verification of data while sharing through any network or medium. It uses hierarchical hash verification technique to cross verify authenticity and consistency of the data. It is a tree like data structure with leaves storing hash values of the original data and non-leaf nodes contains hash values of their corresponding child node.we take hash node of child node until we reach at root level and get single hash value that is shared in header and verified later to maintain integrity and authenticity of the data. If a eavesdropper try to corrupt any data segment then hash value of corresponding nodes will change and thus corresponding hash of root node will also change. It synchronize the data in logarithmic time complexity.
Use of Hash in Merkle tree
Let us understand what is hash function and how hash function works.
A hash function is a normal function that takes a data and generates a message code digest that is unique to that data. It's like making fingerprint of the data that can easily identify a large block of data. you can use different login to generate mapping of data with a hash value.
Here is simple example of hash function in python.
import hashlib
def generate_hash(data):
# Create a hashlib object
hash_object = hashlib.sha256()
# Update the hash object with the data
hash_object.update(data.encode())
# Return the hexadecimal representation of the hash
return hash_object.hexdigest()
# Example usage:
data = "Hello, world!"
hashed_data = generate_hash(data)
print("Hashed data:", hashed_data)
we have data and when we will pass that data to the hash function it will generate a unique hash identifier to that data. Here 'generate_hash' function takes input as data and computes SHA-256 hash value and return hexadecimal representation of hash.
Output
Hashed data: b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
Let us Implement a basic Binary Merkle tree to further understand the topic.
import hashlib
class MerkleTree:
def __init__(self, data):
self.data = data
self.tree = self._build_tree(data)
def _hash(self, data):
return hashlib.sha256(data.encode()).hexdigest()
def _build_tree(self, data):
if len(data) == 1:
return [self._hash(data[0])]
elif len(data) % 2 != 0:
data.append(data[-1]) # Duplicate the last element to make it even
tree = []
for i in range(0, len(data), 2):
left_hash = self._hash(data[i])
right_hash = self._hash(data[i+1])
tree.append(self._hash(left_hash + right_hash))
return self._build_tree(tree)
def root(self):
return self.tree[0]
def get_proof(self, index):
proof = []
if index < len(self.data):
current_index = index
for level in range(len(self.tree)):
sibling_index = current_index + 1 if current_index % 2 == 0 else current_index - 1
proof.append(self.tree[level][sibling_index])
current_index //= 2
return proof
def verify_proof(self, index, value, proof):
computed_hash = self._hash(value)
current_index = index
for sibling_hash in proof:
if current_index % 2 == 0:
computed_hash = self._hash(computed_hash + sibling_hash)
else:
computed_hash = self._hash(sibling_hash + computed_hash)
current_index //= 2
return computed_hash == self.root()
# Example usage:
data = ['Transaction1', 'Transaction2', 'Transaction3', 'Transaction4']
merkle_tree = MerkleTree(data)
print("Root hash:", merkle_tree.root())
index_to_prove = 2
proof = merkle_tree.get_proof(index_to_prove)
print("Proof for index", index_to_prove, ":", proof)
value_to_verify = data[index_to_prove]
print("Verification result:", merkle_tree.verify_proof(index_to_prove, value_to_verify, proof))
this python code illustrates the basic implementation of the Merkle tree that can be used to prove authenticity of the data.
Here is the Time complexity of each operation in the Merkle tree that enables it to used for various data verification techniques
Operations | Complexity |
---|---|
Space | O(n) |
Searching | O(logn) |
Insertion | O(logn) |
Deletion | O(logn) |
Traversal | O(n) |
Synchronization | O(logn) |
Let us see the uses of Merkle tree in various technologies.
Application of Merkle Tree
- Merkle Tree is used by distributed system where same copy of data is present in difference instances of the database.
- Cassandra uses Merkle tree for maintaining authenticity and inconsistency among difference copies of the data.
- Merkle tree extensively used in the Blockchain technologies.
- Merkle Trees are used in file synchronization
- It is used in Certificate transparency long in SSL/TLS certificates.
In distributes peer-to-peer network and systems Merkle tree can play a important role. one of the demerit of Merkle tree is that there is always a possibility of collision attack where two different data items have same hash value of formed Merkle tree root. Any intruder can exploit this and harm the dataset.