boulder icon indicating copy to clipboard operation
boulder copied to clipboard

Design mechanism for horizontal database sharding

Open aarongable opened this issue 3 years ago • 0 comments

The thing we expect to affect database performance the most in the future is write-sequencing at the primary. When writing large numbers of rows to multiple tables (e.g. a new order with 100 names creates 201 rows across 3 different tables in a single transaction) we can run into performance bottlenecks.

So we want to start thinking about horizontally sharding our database across multiple database clusters, each with a write primary and multiple read replicas.

We expect that the best path forward here will be sharding based on (a hash of) the RegID / account ID of the requesting account. This means that all of a single client's account metadata, orders, authorizations, and certs will be held in a single shard. We do not expect to run into "a single user is bigger than a single shard" problems, as we currently fit everyone into just one "shard".

However, we have some decisions to make:

  • How/when/where do we determine which shard to talk to?
    • When making a request to an SA?
    • When an SA makes a request to a DB?
  • How to handle non-SA services that talk to the DB, like cert-checker and expiration-mailer?
    • Should they switch to using the SA as an intermediary, with long streaming queries?
    • Should we run multiple instances, one per shard?
    • Should they learn how to navigate sharding directly themselves?
  • Probably lots of other unknowns

This bug exists to track this design work and be an umbrella bug for future pieces of implementation work.

aarongable avatar Jan 24 '23 19:01 aarongable