Sharding Notes – When to shard?
I’m nearing the end of a database sharding project and I wanted to make some notes about what I’ve observed both for myself, so I can refer back here when I do it again, and for anyone who may be interested in the subject.
To Shard or not to Shard, That is the Question
It’s nice to dream, but do you really want to put in all the extra time and effort it will take to implement sharding before knowing that you will reach the huge volume necessary to require you to chop your database into bits?
I’d say no and the guys over at 37Signals agree: Don’t do it until you have to.
Sharding is an evolutionary step that you should undertake only when you really need to.
Should you shard from the start?
Sitting in the Ivory Tower of design, it is easy to predict which entity you will be partitioning your shards on, but after running your system for a few months, or better still a few years, you are likely to find that your lofty theories don’t stand up to the harsh realities of production.
Example: In the system I’m working on, it was originally thought that the Users table would be the one to partition by because users had Contacts and the ContactActivities table would get huge and this would be best chopped up by User. But as the business progressed and the true needs of the people using it became clear it turned out that a totally different table was the bottleneck and a totally different entity had to be used for partitioning.
If that database had been designed from scratch to shard across the Users table, then the work necessary to “reshard” would have ended up being much more than to shard it across the correct entity after waiting a while to find out what that entity was.
So I’d say, don’t shard from the start. Shard when you need to because of volume and when you have the data in your database that you can analyze to tell you what to shard on.
The one exception to this is where you are building a system that is a copy of an existing system which has already gone through the process and the sharding entity is well known.
For example, if you are building the next Facebook then you can use the same entity they do to shard by.