Sharding Notes: First rule of database sharding
As far as I’m concerned, the first rule of database sharding is “You must have a data access layer“.
If you have architected your system in a “client-server” style were data access code is embedded in the client code then the effort required to shard is going to be huge, partially because you have so many places in your code to address and partially because of the vast amount of testing that is going to be required because you have likely touched every source file in your system.
I hit this problem some time ago. It wasn’t on a sharding project but something similar. The project was writing the next version of a system that had been written in VB6 by some inexperienced developers. Data access code and dynamically built SQL statements were all over the place. Their idea of a data access layer was a “Db” module that had methods for running queries and returning DAO result sets. E.g.,
Set rs = Db.Execute("Select * from Customers where " _ + strCol + " = " + strCustName)
And about 20% of the time they would directly call DAO and not user their own “DAL”. Yuck!
We began moving the functionality over to .NET piece by piece. I created a three-layer style architecture for the .NET pieces: UI Layer, Domain Layer (sometimes called Business Logic Layer) and Data Access Layer. We used COM Interop to call the .NET code from the VB6 exe. (I wish I could have created a .NET exe and converted the VB6 part to a DLL but … well that’s a subject for another post.)
After some successful releases the client decided they wanted to move over to using SQL Server instead of MS Access. We got a SQL Server version of the database created relatively quickly and then we moved on to converting the code to use the new database.
To convert the .NET code took one developer 3 days and that included extensive unit testing. This was because all data access code was in one place and the Domain and UI Layers knew nothing about the database: not its structure, not its constraints, not its table names, not its column names, nothing. It would have taken less time if it were not for the fact that we had to tweek some of the Access SQL so it would play nice with SQL Server.
To convert the VB6 code took three developers over two months to get about 75% done (without all the needed testing) and was then put on hold because there was a deadline to get out some high priority new features to the system users.
Why so long? It wasn’t the developers, these were smart, hardworking guys. No, it was simply the lack of a data access layer.
The same applies to sharding a database. If you don’t have your data access code isolated in one area then I’d suggest you do that first before you even think of sharding.
It is probably possible to shard without a data access layer, but then it’s probably possible to stab yourself with a machete and not bleed to death, but do you really want to risk it?