Skip to Content

Does WhatsApp have its own database?

WhatsApp is one of the most popular messaging apps in the world, with over 2 billion monthly active users as of 2022. Given its massive userbase, WhatsApp handles an enormous volume of messages, media, and data every single day. This raises an important question – does WhatsApp have its own database to store all this information, or does it rely on third-party databases?

The Short Answer

Yes, WhatsApp does maintain its own databases to store user data and message history. WhatsApp leverages a customized version of the Erlang database along with other databases like SQLite to manage user profiles, contacts, chat history, media attachments and more. All of this data is encrypted end-to-end for security.

WhatsApp’s Architecture and Database Stack

When WhatsApp was first launched in 2009, it did not have its own database system. Instead, it relied on third party solutions like ejabberd and Redis to manage contacts and chat sessions. However, as WhatsApp started scaling beyond 200 million active users, managing user data through third party databases became challenging.

In early 2014, WhatsApp decided to transition to its own in-house stack and architecture. The company built WhatsApp on top of Erlang, a programming language known for high availability and low latency. For its database, WhatsApp customized and extended an Erlang-based database called CouchDB. This custom Erlang-based data store was optimized for massive volumes of small data records and real-time data synchronization across devices.

Key Components of WhatsApp’s Tech Stack

  • Erlang VM – For concurrent networking, distribution and fault tolerance
  • Custom Erlang DB – Derived from CouchDB, manages user accounts and chat sessions
  • SQLite – For storing contact lists, media attachments
  • Riak – For storing media attachments like photos, videos
  • Audio & Video Codecs – Opus, H.264, for compression and streaming
  • Encryption – Signal Protocol, for end-to-end encryption

By mid 2014, WhatsApp fully transitioned its billions of users and their message history to this new in-house data platform. This gave WhatsApp complete control over the scalability, availabilty and performance of its database systems. WhatsApp has continued optimizing and evolving this stack as its growth accelerates.

How WhatsApp Stores User Data

WhatsApp’s databases are structured and partitioned to efficiently store the different types of data generated by users. Some key examples:

User Profiles and Contacts

User profile information like names, profile photos and statuses are stored in the custom Erlang database. Contact lists for each user are stored in local SQLite databases on devices. The Erlang DB stores references to which users are contacts of whom. This facilitates syncing contact lists across devices.

Message History and Media

Entire chat history including messages, calls, media attachments are stored in the Erlang DB. Each chat is stored as an ordered collection of messages, indexed by timestamp. Media attachments like photos, videos and documents are stored separately in Riak for scale and performance. The Erlang DB stores references to media for each message.

Notifications and Delivery Receipts

Pending notifications, delivery receipts and read receipts are also tracked in respective tables in the Erlang DB. This powers notification behavior and double ticks indicating message delivery and reads.

Indexing by timestamps and clever DB partitioning enables WhatsApp to fetch and sync message history efficiently even for years-long chats with thousands of messages.

WhatsApp’s Database Scale

As of 2022, WhatsApp reportedly processes over 100 billion messages per day. This translates to over 41 billion messages every month, being handled by WhatsApp’s databases. Some estimates on WhatsApp’s database scale:

  • >1 million QPS (queries per second) at peak
  • Manages connectivity for >2 billion devices
  • Handles >65 billion contacts and >100 billion messages per month
  • Daily storage of >1 billion photos and >100 million videos

To put things in context, WhatsApp’s monthly message volume is more than 3 times the entire daily internet traffic in the year 2000. Handling this data explosion requires immense database scalability and complex data partitioning. WhatsApp leverages distributed Erlang clusters and advanced load balancing to scale massively across global datacenters.

Key Benefits of WhatsApp’s Database Approach

Some of the key benefits WhatsApp derives from its homegrown messaging platform and databases:

Speed and Performance

The Erlang VM enables low latency and fast message routing. Small data record design ensures blazing fast writes and reads even at massive scale.

High Availability

Erlang’s always-on model and real-time replication makes the WhatsApp service highly available and resilient to failures and partitions.

Scalability

WhatsApp can horizontally scale across servers and clusters to handle growth in users, messages and media volumes.

Control

Complete control over performance, scaling, reliability and new feature development.

Cost Savings

No licensing costs for proprietary databases. Optimization of infrastructure costs.

Security and Encryption

While WhatsApp manages its own databases, client-side end-to-end encryption ensures message contents can never be decrypted by WhatsApp itself or hackers. The Signal encryption protocol secures all text messages, media, voice messages, calls etc. Media attachments are encrypted using a random key, which is exchanged encrypted over the chat session. Only the sender and recipient have this key for decryption. Not even WhatsApp has access.

Moderation and Monitoring

While message contents are encrypted from WhatsApp itself, some user-level metadata has to be accessible for moderation. For banned users and groups spreading harmful content, WhatsApp maintains additional encrypted databases recording message timestamps, sender-recipient pairs and other signals for investigation. This metadata can help identify high-risk accounts without compromising privacy.

Conclusion

In summary, WhatsApp maintains complete in-house database systems built on Erlang, SQLite and other technologies to manage its billions of users and trillions of messages. Custom optimization enables blazing speed, high availability and massive scalability as WhatsApp continues its growth. Client-side end-to-end encryption ensures private contents are inaccessible even to WhatsApp. Combined with minimal metadata access for moderation, WhatsApp’s homegrown data platform powers its service while providing industry-leading privacy.