Design mechanism for horizontal database sharding
The thing we expect to affect database performance the most in the future is write-sequencing at the primary. When writing large numbers of rows to multiple tables (e.g. a new order with 100 names creates 201 rows across 3 different tables in a single transaction) we can run into performance bottlenecks.
So we want to start thinking about horizontally sharding our database across multiple database clusters, each with a write primary and multiple read replicas.
We expect that the best path forward here will be sharding based on (a hash of) the RegID / account ID of the requesting account. This means that all of a single client's account metadata, orders, authorizations, and certs will be held in a single shard. We do not expect to run into "a single user is bigger than a single shard" problems, as we currently fit everyone into just one "shard".
However, we have some decisions to make:
- How/when/where do we determine which shard to talk to?
- When making a request to an SA?
- When an SA makes a request to a DB?
- How to handle non-SA services that talk to the DB, like cert-checker and expiration-mailer?
- Should they switch to using the SA as an intermediary, with long streaming queries?
- Should we run multiple instances, one per shard?
- Should they learn how to navigate sharding directly themselves?
- Probably lots of other unknowns
This bug exists to track this design work and be an umbrella bug for future pieces of implementation work.