seq
seq copied to clipboard
A few possible corrections / tweaks / addendums
Trade-offs
Although this model offers you great performance and linear horizontal scalability, it comes with some possibly serious trade-offs:
- Using the above distribution, you cannot:
- have more than 1024 machines in your cluster
- handle more than 4096 queries per millisecond per machine
- given a cluster of N machines, guarantee the ordering of M IDs that were generated within a range of N milliseconds
- The system relies on wall-clock time
There is a lot of literature out there about the dangers of non-logical time in distributed systems (..even with a perfectly configuredntpd
), so I won't go into details; check theFurther reading
section if you're curious about those things.
I'd like to set some things straight here:
The 1024 machines, 4096 queries/ms is only when you choose to use 41/10/12 bits configuration. I'm not sure Snowflake allowed for different setups but IdGen, a .Net library, does (Full disclosure: author here) and AFAIK other similar systems do too but I'm not aware of Go specific implementations. Also the N-queries per time-interval is a) dependent on the bits/part configuration (and since 2.0 IdGen has moved away from the hard-coded "ms" 'resolution' and allows anything from (theoretical) micro- or even nanoseconds (hardware support permitting ofcourse) to, potentially, decades or longer and anything in-between. As long as the sequence part doesn't overflow in that period you're fine.
Also Snowflake didn't exactly rely on wall-clock time (IIRC); it relied on a strictly increasing, monotonic clock (as does IdGen). When it starts off the time will usually be equal to (or roughly equal to) the wall-clock time but that will divert pretty soon; especially in DST situations.