Database Sharding

Author: Al-mamun Sarkar Date: 2020-09-18 05:04:41

There are mainly two types of partitioning system in the database one is Horizontal partition and another one is Vertical partition. Sharding is the horizontal partition of data across multiple servers. 

What is sharding?

Sharding is splitting large data into smaller chunk and store those across multiple database servers. So that database query load can be distributed across those servers. It's a technique to break up a large database into many smaller chunks. 

Sharding Methods:

  • Horizontal Partitioning:  Horizontal partitioning means putting different rows into different databases. It can be range based sharding. Example - Storing data based on alphabetical names range. In some cases, it not so good because we may have more data on starting characters A-D and some other range may have less amount of data. So sometimes it can be imbalanced. 
  • Vertical Partitioning: Vertical partitioning means dividing tables based on features. Such as 1 shard is for the users table and another one shard is for the locations table and so on. It also can be imbalanced if one feature has mone data than others.  

Sharding Criteria:

  • Hash-based Partitioning: Using hashcode on any entity to divide the data across shards. The entity can be the primary key id or name or any others. Consistent Hashing can be used to implement this. This is a good algorithm for sharding.
  • List Partitioning: Based on the list we can store data across shared. Suppose we have a customer table so we can store Asian customers to shard, European customers to another shard, American customers to another shard, and so on. It can be imbalanced if one list has a huge number of customers than others.
  • Round Robin Partitioning: We also can use round-robin algorithms to store data across multiple shards. 
  • Composite Partitioning: Composite partitioning means combining more algorithms. Example - If we use list-based partitioning for customers table and one list has a huge number of customers than others. In that case, we can again shard that list based on hash-table. 

Sharding Challenges:

  • Difficult to maintenance ACID Compliance
  • Join inefficient.