Skip to Content

Does WhatsApp use a database?

Yes, WhatsApp does use a database. WhatsApp is a popular messaging app owned by Meta (formerly Facebook) that allows users to send messages, photos, videos, documents, voice messages and make calls over the internet. To store all this user data and metadata, WhatsApp relies on databases.

What is a database?

A database is an organized collection of data stored electronically on a computer system. Databases allow for the storage, manipulation and retrieval of large amounts of structured data efficiently. They are composed of tables that contain rows (records) and columns (attributes). Database Management Systems (DBMS) are used to manage and interact with databases.

Why does WhatsApp need a database?

WhatsApp needs to use databases for several reasons:

  • User accounts – WhatsApp needs to store details about each user like their phone number, profile name, profile picture, status etc. This is stored in a database table with each row representing a user.
  • Contacts – WhatsApp needs to store each user’s contacts which are synced from their phonebook. The app uses a database table to store contact details.
  • Messages – All the chats, media and files shared between users are stored in a database. This includes text messages, photos, videos, documents, voice messages and more. Each message has attributes like sender, receiver, timestamp, media type etc.
  • Groups – Information about the various groups created on WhatsApp, their members, media shared in groups etc. is stored using a database model.
  • Status updates – When users post status updates, these need to be stored in a database table as well.

Without a proper database, it would be impossible for WhatsApp to manage terabytes of data from over a billion users efficiently. The app needs robust data storage and retrieval capabilities to operate.

What database does WhatsApp use?

WhatsApp uses a NoSQL database for most of its backend data storage needs. NoSQL databases are non-relational, distributed and able to handle large volumes of unstructured or semi-structured data.

Specifically, WhatsApp uses Erlang Term Storage (ETS) for its databases. ETS is a NoSQL key-value data store built into the Erlang programming language runtime on which WhatsApp was originally developed.

Some key advantages of ETS databases for WhatsApp:

  • High scalability – ETS scales linearly and provides fast read/write operations to handle WhatsApp’s enormous user base.
  • Data distribution – Data can be distributed efficiently across multiple nodes and machines.
  • Low latency – ETS offers microsecond latency for read/write operations which is critical for messaging.
  • Real-time performance – Ideal for systems like WhatsApp that require real-time updates.

In 2012, before Meta acquired WhatsApp, the app was reported to be processing over 2 billion messages per day using ETS databases scaled across multiple Erlang VM instances. Today, that number is likely to be in the trillions.

WhatsApp database architecture

While the exact details are not public, this is WhatsApp’s general database architecture:

  • User data like profiles, contacts and groups are stored in ETS tables sharded by user phone number.
  • Message data is divided into separate ETS tables for each user-to-user or group chat.
  • Media files like photos and videos are stored on a distributed file system.
  • Signals like delivery receipts, read receipts and online status are stored in ETS tables.
  • Data is replicated across multiple data centers for redundancy.
  • Look-aside caching systems like memcached are used to cache data and reduce database load.
  • Master databases are used for writes while slaves support read operations.
  • SQLite databases may also be used in some places by the mobile clients.

This architecture allows WhatsApp to shard and replicate its data across clusters while efficiently routing messages and lookups. The mix of ETS, file storage, caching and SQL provides low latency and high throughput for hundreds of millions of concurrent chats.

How does WhatsApp efficiently manage databases at scale?

WhatsApp processes over 65 billion messages per day as of 2023 across billions of end users. This enormous scale presents significant database challenges. Here are some ways WhatsApp efficiently manages its databases:

Sharding

WhatsApp databases are sharded or partitioned across many servers. Different ETS tables are assigned to different shards based on the phone numbers being used. This divides the load across shards allowing the database to scale.

Read replicas

WhatsApp makes use of read replicas for its ETS databases. Read queries are handled by replica database nodes while writes go to the master nodes. This improves read performance and scalability.

Caching

Frequently accessed data like user profiles and chat statuses is cached in memory using memcache or Redis. This reduces load on the databases.

Asynchronous writes

Message sends involve an asynchronous write to the database. This allows WhatsApp to immediately confirm message delivery and write it to the database afterwards.

Efficient routing

WhatsApp uses consistent hashing to route messages and chat data to the correct shard. This minimizes latency and improves throughput.

Load balancing

Load balancers evenly distribute read and write requests across available database servers. This prevents hot spots and improves resource utilization.

Geographic distribution

WhatsApp database nodes are distributed across different geographic regions to be closer to users. This reduces network latency for database operations.

Fault tolerance

Data is replicated across multiple clusters to provide high availability. WhatsApp can continue operating even if some database nodes go down.

WhatsApp’s usage of other databases

While ETS is its primary data store, WhatsApp does leverage other databases in its stack as well:

SQLite

SQLite, a self-contained SQL database engine, is used by the WhatsApp mobile apps on Android, iOS and Windows Phone platforms to store frequently accessed data like contacts and profiles locally on the device.

Cassandra

Apache Cassandra, a distributed NoSQL database, is used by WhatsApp for certain data analytics and aggregations done on its platform. It provides high scalability for such workloads.

HBase

Facebook’s HBase, a scalable NoSQL database built on Hadoop, is used by WhatsApp for managing large volumes of timeseries data such as metrics and logging.

MySQL

For some auxiliary services and products, WhatsApp uses the traditional relational database MySQL. This includes managing account registrations on its website, billing data etc.

So in summary, while ETS forms the heart of WhatsApp’s real-time database needs, it does leverage other databases like SQLite, Cassandra, HBase and MySQL for specific use cases. The multi-database hybrid architecture provides flexibility and optimizes for different kinds of data workloads.

Conclusion

WhatsApp relies heavily on databases to manage user accounts, contacts, chats, media, groups and other data generated by over a billion daily active users on its platform. The app uses Erlang Term Storage (ETS), a high performance NoSQL database built into Erlang, as its primary data store.

ETS provides low latency, high concurrency and scalability to handle WhatsApp’s chat messaging load. Data is sharded and replicated across servers. Caching, load balancing and geographic distribution help WhatsApp scale ETS efficiently. Other databases like SQLite, Cassandra, HBase and MySQL are used for specific secondary needs. WhatsApp’s versatile multi-database backend architecture powers its real-time messaging capabilities used by over a quarter of the world’s population.