Sharding Notes – Splitting Tables
We hit some tables where some records applied to only one shard but others applied to multiple shards.
As an example, one of the tables was Users. The database is sharded on the Companies table. Many users belong to only one company (e.g. employees of a company) but there were some (such as in-house support people and super-admins) who belong to multiple companies.
The possible solutions we came up with were:
- Split the tables according to the different records (Table per Type or Table per Concrete Class). For example, keep the Users table for people who belong to only one company and create an AdminUsers table for those who belong to multiple companies.
- Add a discriminator column (e.g., UserType with values of ‘Regular’ or ‘Admin’) and handle the records according to the type (Table per Hierarchy).
- Move the split table that applies to multiple shards into a “master” database so it is not on the shards and query it there. E.g., we split Users into Users and AdminUsers and move AdminUsers onto the master database.
- When loading a user do a cross-shard query to get all the companies/shards the entity is on. Then do all further CRUD operations on the entity by hitting the specific shards you found in the first query. E.g., User #5 administers company #7 on shard 1, company #10 on shard 2 and company #15 on shard 3. He does not administer any companies on shards 4 or 5. Further CRUD for that user will hit shards 1, 2 and 3.
My preference is #1, but I think we’ll be going with #4 on all the “special” tables because the amount of extra work in implementing #1 right now is way more than we have the time or resources for. Perhaps once we have the sharded system up and running we may switch to #1, but that is in the future and a lot depends on the performance for #4. Luckily most of the “special” tables (other than User) are not high traffic, so we’re hoping it won’t be a problem.
Of course your own situation will probably be different so you should evaluate the pros and cons of these methods for your particular situation. And you may come up with other ways to do it. If you do, please leave a comment describing your solution.
Good Sharding Article
Here is a good intro article on sharding: Database Sharding.