Thursday, November 10, 2011

Database Sharding Strategies

Shard by Modulus:
For many applications, it is appropriate to shard based on a shard key such as a User ID. Using a modulus of a numeric ID, especially an auto increment primary key, ensures even distribution of data between shards.

Shard by Date/Time Range:
For time-based data such as feeds or blogs where data is accumulating over time, it may make sense to shard by date range. For example, each shard could contain data for a single month. New shards can be added each month and old shards can be dropped once historic data is no longer needed.

Master Lookup:
It is sometimes a requirement to control the sharding manually or in an application specific manner. One example would be a requirement to host key customer accounts on shards hosted on higher specification hardware. To support this requirement, a master shard can be created which contains lookup tables to map customer IDs to a specific shard number.

Session-based Sharding:
Some categories of application, particularly user-centric web applications, can choose a shard when a customer logs in and then direct all queries to the same shard for the duration of that user session.

Fixed Shard:
mapping tables to specific fixed shards.

Global Tables:
Global tables are tables which are hosted in all shards and data is automatically replicated across all shards. The benefit is that these tables can be used in joins with sharded tables in each shard. Global tables are typically fairly static tables or with low write volume, such as product codes, countries, and other reference data.

Custom Sharding:
allows developers to control all sharding behavior by providing code-level integration or through the use of shard hints, which are comments that can be placed within a SQL statement to instruct the driver to use a particular shard for a query.

(Reprint posts: from http://www.dbshards.com/dbshards/database-sharding-strategies/)

No comments:

Post a Comment