Understanding End-to-End Encryption in Messaging Apps like WhatsApp
Photo by Dima Solomin on Unsplash
In a world where our online conversations are an integral part of our lives, the need for privacy and security in messaging apps has become more critical than ever. End-to-end encryption is a fundamental technology that ensures our messages remain private and secure from prying eyes. The core protocol behind this technology is the Signal Protocol, and it involves various key components and processes that we'll explore in this blog post.
The Keys to Privacy
At the heart of end-to-end encryption are cryptographic keys. These keys play a crucial role in securing your communication and ensuring its authenticity:
Identity Key Pair
An Identity Key Pair consists of two keys: a public key and a private key. These keys are generated when you first install the app. The public key is shared with other users and is associated with your long-term identity within the messaging system. It's used for the persistent identification of a user.
Signed Pre-Key
The Signed Pre Key is another key pair generated during installation, consisting of a public key and a private key. The Identity Key signs this key pair, guaranteeing its authenticity. Signed Pre Keys are medium-term in nature and periodically rotated to enhance security.
One-Time Pre Keys
One-Time Pre Keys are key pairs generated during installation and can be replenished as needed. These are used for one-time, short-duration communication sessions. Each key pair is typically used for a single session, providing forward secrecy, meaning compromising one session's key pair doesn't affect the security of other sessions.
Session Keys
Session keys are derived during secure communication sessions, and they include:
Root Key: A 32-byte value used to create Chain Keys and serves as the initial secret for deriving other session keys.
Chain Key: Also a 32-byte value derived from the Root Key, used to create Message Keys. Chain Keys are rotated periodically for forward secrecy.
Message Key: A complex structure with an 80-byte value used to encrypt message content, including 32 bytes for a 256-bit AES key for encryption,32 bytes for an HMAC-SHA256 key for message integrity, and 16-bit an Initialization Vector for AES encryption.
Client Registration
When you register in a messaging app, your public Identity Key, public Signed Pre Key (signed by the Identity Key), and a batch of public One-Time Pre Keys are transmitted to the server and stored. These keys play a vital role in establishing secure communication and ensuring authenticity and security.
Initiation of Session
Initiating an encrypted session involves several key steps:
The initiating client requests the recipient's public keys, including the Identity Key, Signed Pre Key, and a single One-Time Pre Key (if available).
The server returns the requested public keys. One-Time Pre Keys are used only once and are removed from the server storage after being requested.
The master secret is calculated through multiple ECDH key exchanges to ensure the security of the session.
The initiator receives the recipient's public keys from the server: Identity Key (I recipient), Signed Pre Key (S recipient), and One-Time Pre Key (O recipient) if available.
The initiator generates an ephemeral key pair (E initiator). This key pair is temporary and is used specifically for this session.
The initiator loads their own Identity Key (I initiator). This is the long-term public key that identifies the initiator.
The initiator calculates a master secret, which will be used to derive session keys for encryption and decryption. The master secret is created through multiple ECDH (Elliptic Curve Diffie-Hellman) key exchanges:
ECDH(I initiator, S recipient): This operation combines the initiator's Identity Key with the recipient's Signed Pre Key.
ECDH(E initiator, I recipient): This combines the initiator's ephemeral key with the recipient's Identity Key.
ECDH(E initiator, S recipient): This combines the ephemeral key with the recipient's Signed Pre Key.
If there is a One-Time Pre Key (O recipient), it is also used in a similar ECDH operation.
These ECDH operations generate shared secrets that are concatenated to create the master_secret. This shared secret is unique to this session.
-
Using the master_secret, the initiator employs a Key Derivation Function like HKDF (HMAC-based Key Derivation Function) to generate a Root Key and Chain Keys. The very first Chain Key in the process is the Root Key.
Receiving Session
Once the recipient receives a message that includes session setup information, they can calculate the corresponding master secret and derive the Root Key and Chain Keys using their private keys and the public keys from the incoming message's header.
Until the recipient responds, the initiator includes the information (in the header of all messages sent) that the recipient requires to build a corresponding session. This includes the initiator’s (E initiator and I initiator).
When the recipient receives a message that includes session setup information:
The recipient calculates the corresponding master_secret using its private keys and the public keys advertised in the header of the incoming message.
The recipient deletes the One-Time Pre Key used by the initiator.
The initiator uses HKDF to derive a corresponding Root Key and Chain Keys from the master_secret.
Exchanging Messages
In a secure session, messages are exchanged using a Message Key for AES256 encryption in CBC mode and HMAC-SHA256 for authentication. What's unique is that the Message Key changes for every message, making it ephemeral and impossible to reconstruct from the session state. This key is derived from the sender's Chain Key, which advances with each sent message. Moreover, a new ECDH agreement occurs with each message round trip, creating a fresh Chain Key. This dual mechanism of an immediate "hash ratchet" and a round trip "DH ratchet" ensures forward secrecy, strengthening the security of the communication.
A. When a user is sending messages in succession:
Calculate the Message Key = HMAC-SHA256*(Chain Key, 0x01)*.
Update the Chain Key = HMAC-SHA256(Chain Key, 0x02).
This process effectively "ratchets" the Chain Key forward, ensuring that a previously stored Message Key cannot be employed to determine current or past Chain Key values.
B. Two-way messaging
Whenever a message is sent, an ephemeral public key is shared alongside it. Upon receiving a response, a new Chain Key and Root Key are computed through the following steps:
Calculate ephemeral_secret = ECDH*(Ephemeral sender, Ephemeral recipient).*
Using HKDF update, Chain Key, Root Key = HKDF*(Root Key, ephemeral_secret)*.
This step of recalculation of the secret is because the chain only prevents going backward in the chain, but can be used to generate subsequent keys if any key in the chain is exposed.
Conclusion
The working of WhatsApp or Signal includes more complexity to it. This article only covers the communication steps of an individual talking to another with one device. However, in real these application includes group chats and multiple devices which includes additional steps and exchange keys in a more or less similar fashion. If you wish to learn more about this topic you can refer to this White Paper by WhatsApp.