WhatsApp is one of the most popular messaging apps in the world, with over 2 billion active users. It allows users to send messages, photos, videos and documents easily to individuals or groups. WhatsApp uses end-to-end encryption for all communications, providing a high level of security and privacy.
Behind the simple interface of sending and receiving messages, WhatsApp stores all user data in databases. A database is an organized collection of data stored digitally on a computer system. WhatsApp uses databases to store user information, message history, media files and more. Understanding WhatsApp’s use of databases provides insight into how the app functions and protects user data.
What is a database?
A database is a structured set of data held in a computer, especially one that is accessible in various ways. Databases allow for efficient storage, organization and retrieval of large amounts of data. They are composed of tables that store related information, with columns defining the types of data and rows containing the actual data values.
Databases use special database management system (DBMS) software to create, access, manage and search for information within the database. There are several types of DBMSs, including relational, non-relational (NoSQL), object-oriented, graph and deductive databases. Popular DBMS software includes MySQL, Oracle, MongoDB, Cassandra and Microsoft Access.
Key advantages of using databases include:
- Organized data storage and structure
- Efficient data access and queries
- Data integrity and security
- Data relationships and modeling
- Concurrent access by multiple users
- Persistence of data after application ends
WhatsApp relies heavily on database systems to manage user accounts, conversations, media files and application data.
What databases does WhatsApp use?
WhatsApp uses a mix of relational and NoSQL database systems to support its messaging application:
MySQL
MySQL is a popular open source relational database management system used by many large web applications. WhatsApp uses MySQL to store user account information like names, profile photos, status messages and privacy settings.
SQLite
SQLite is a lightweight relational database contained in a single file on disk. On Android devices, WhatsApp stores the entire local message database in an encrypted SQLite file. iOS devices also use SQLite databases for local data storage.
Redis
Redis is an in-memory NoSQL database that is very fast and efficient. WhatsApp uses Redis to store frequently accessed data like active user sessions, chat lists and contact information.
Cassandra
Cassandra is a distributed NoSQL database designed to handle large amounts of structured data across servers. WhatsApp relies on Cassandra for storing messaging data across its servers.
Riak
Riak is a decentralized NoSQL database optimized for availability and fault tolerance at scale. WhatsApp uses Riak Key-Value data stores for storing media attachments like photos and videos.
By combining both SQL and NoSQL databases, WhatsApp is able to efficiently store all the varied data types produced by billions of users and deliver low latency access.
How WhatsApp uses databases
WhatsApp leverages its databases for the following key functions:
User account information
User profile data like names, profile photos, status and settings are stored in MySQL databases. This allows WhatsApp to retrieve and display the latest user data each time you open the app.
Contact lists
Your phone’s contact list is synced with WhatsApp to allow messaging within your network. These contact details are stored in Redis caches for fast lookups to display contact names and avatars.
Message data
The entire history of messages, calls and media sent and received is stored in encrypted local databases on each user’s device. SQLite is used on mobile devices while MySQL stores a backup copy on WhatsApp’s servers.
Media attachments
All photos, videos and files exchanged by users are stored in Riak Key-Value data stores spread across WhatsApp’s content delivery network. This allows fast access to media when users request past items.
Metadata
User IP addresses, device information, usage metrics and interactions with businesses on WhatsApp are stored in Cassandra NoSQL databases for analytics and auditing purposes.
Chats and contacts
Lists of conversations and contacts are cached in Redis so they load instantly when opening the app. Frequently accessed data is kept in fast in-memory Redis to look up quickly.
Push notifications
When new messages arrive, WhatsApp queries its databases to identify which devices should be notified and sends push notifications to their operating systems using this data.
Key database tables
While WhatsApp’s databases are not publicly accessible, based on what we know of its functions we can infer some key tables that likely exist:
Table Name | Contents |
---|---|
Users | User IDs, names, profile info |
Contacts | Contact IDs, phone numbers |
Messages | Message IDs, sender, recipient, text content |
Media | Media IDs, file attachments |
Devices | Device IDs, push tokens |
Sessions | Session IDs, user IDs, status |
These hypothetical tables illustrate the types of relational data that WhatsApp might store for core functionality.
Key database queries
WhatsApp executes complex queries across its databases to assemble user data into the application interface. Some examples of key queries include:
- Lookup user profile data when app loads using User ID
- Retrieve list of recent chats using time range filters
- Load message history for selected chat using chat ID
- Fetch media files by Media ID when sending attachments
- Search for contacts by name or phone number substring
- Check Session table for active user session status
- Insert new message rows with sender, recipient, timestamps
These types of queries allow WhatsApp to populate the app interface dynamically from persisted database data.
Data integrity and backups
The integrity and reliability of the underlying databases are critical for WhatsApp to operate without errors or data corruption. Some techniques used include:
- Periodic backups – databases are backed up daily to prevent data loss
- Replicated copies – multiple copies of databases are maintained using replication
- RAID storage – database servers use RAID disk arrays to protect against disk failures
- Transaction logs – database changes are written to transaction logs to prevent partial updates
- Integrity constraints – data validity enforced at database level through entities, relationships
- Consistency checks – databases tested regularly for consistency issues
Database encryption
WhatsApp implements end-to-end encryption using the Signal protocol to protect user messages, media and calls. Encryption is applied at the database level:
- Every message is encrypted with a unique key before storing in the database
- Media attachments are encrypted using separate keys
- Keys are exchanged out of band and not stored in databases
- Makes encrypted data unintelligible to anyone without keys
- Provides protection if unauthorized database access occurs
WhatsApp cannot read users’ messages or see unencrypted media due to this encryption applied at the database level.
Optimizing WhatsApp’s databases
WhatsApp’s engineers are constantly optimizing and tuning their database infrastructure to ensure optimal performance at massive global scale. Some optimization techniques include:
- Database partitioning – spreading data across multiple database servers
- Master-slave replication – copies for read parallelization
- Materialized views – precomputed summaries for fast aggregation
- Indexes for faster searches and lookups
- Asynchronous I/O – not blocking transactions during I/O
- Load balancing – distributing read/write loads evenly
- Denormalization – duplicating/grouping data to reduce joins
Given WhatsApp’s billions of users, small optimizations can reduce load times, improve concurrency, lower latency and ultimately save on infrastructure costs.
Conclusion
WhatsApp relies on an array of relational and NoSQL databases like MySQL, SQLite, Redis and Cassandra to provide users with secure messaging capabilities. These databases allow WhatsApp to efficiently store and retrieve the enormous volumes of data generated daily, including user profiles, contacts, chat histories and media files. Advanced encryption and compartmentalization protect the privacy of messages and user data stored within the app’s databases.
Database and software architects face constant challenges in optimizing WhatsApp’s databases to provide fast and responsive service to its users across the world. The details of the proprietary database architecture powering WhatsApp are not publicly known. But examining common database structures and WhatsApp’s core functions gives insight into the critical role efficient databases play in the messaging platform’s success.