Wednesday, December 21, 2011

How Twitter Stores 250 Million Tweets a Day Using MySQL

Twitter's new tweet store:

When you tweet it's stored in an internal system called T-bird, which is built on top of Gizzard. Secondary indexes are stored in a separate system called T-flock, which is also Gizzard based.

Unique IDs for each tweet are generated by Snowflake, which can be more evenly sharded across a cluster. FlockDB is used for ID to ID mapping, storing the relationships between IDs (uses Gizzard).

Gizzard is Twitter's distributed data storage framework built on top of MySQL (InnoDB).

InnoDB was chosen because it doesn't corrupt data. Gizzard us just a datastore. Data is fed in and you get it back out again.

To get higher performance on individual nodes a lot of features like binary logs and replication are turned off. Gizzard handles sharding, replicating N copes of the data, and job scheduling.

Gizzard is used as a building block for other storage systems at Twitter.

http://highscalability.com/blog/2011/12/19/how-twitter-stores-250-million-tweets-a-day-using-mysql.html

No comments:

Post a Comment