seq icon indicating copy to clipboard operation
seq copied to clipboard

A few possible corrections / tweaks / addendums

Open RobThree opened this issue 7 years ago • 0 comments

Trade-offs

Although this model offers you great performance and linear horizontal scalability, it comes with some possibly serious trade-offs:

  • Using the above distribution, you cannot:
    • have more than 1024 machines in your cluster
    • handle more than 4096 queries per millisecond per machine
    • given a cluster of N machines, guarantee the ordering of M IDs that were generated within a range of N milliseconds
  • The system relies on wall-clock time
    There is a lot of literature out there about the dangers of non-logical time in distributed systems (..even with a perfectly configured ntpd), so I won't go into details; check the Further reading section if you're curious about those things.

I'd like to set some things straight here:

The 1024 machines, 4096 queries/ms is only when you choose to use 41/10/12 bits configuration. I'm not sure Snowflake allowed for different setups but IdGen, a .Net library, does (Full disclosure: author here) and AFAIK other similar systems do too but I'm not aware of Go specific implementations. Also the N-queries per time-interval is a) dependent on the bits/part configuration (and since 2.0 IdGen has moved away from the hard-coded "ms" 'resolution' and allows anything from (theoretical) micro- or even nanoseconds (hardware support permitting ofcourse) to, potentially, decades or longer and anything in-between. As long as the sequence part doesn't overflow in that period you're fine.

Also Snowflake didn't exactly rely on wall-clock time (IIRC); it relied on a strictly increasing, monotonic clock (as does IdGen). When it starts off the time will usually be equal to (or roughly equal to) the wall-clock time but that will divert pretty soon; especially in DST situations.

RobThree avatar Aug 01 '16 15:08 RobThree