ksuid icon indicating copy to clipboard operation
ksuid copied to clipboard

Comparison to ULID

Open nicpottier opened this issue 7 years ago • 11 comments

Might be useful to document the reasoning as to why this is better than ULID (or perhaps use cases where it is better and where not)

https://github.com/alizain/ulid/

nicpottier avatar Jun 07 '17 22:06 nicpottier

We've had a couple of people approach us about ULID after publishing. ULID is very similar in aim to KSUID, so it's an apt comparison. The biggest difference I see is the length of the timestamp and random payload. KSUID does not attempt to be length-compatible with standard UUIDs, where-as ULID does.

I think the timestamp bit is mostly self-explanatory (more runway, more precision), but I would be at least cautious about using only 80 bits of randomness. In terms of numbers, each bit roughly cuts the chance of a collision in half, so those extra 48 bits in KSUID matter quite a bit more in terms of collision-safety than the first 80 bits used in ULID.

The 122 random bits used in UUIDv4 has been very widely deployed and battle-tested in production systems for over a decade. Given this empirical evidence we chose 122 bits as our floor, and extended to 128 bits to simplify the implementation. In essence we decided to go with the safest approach that is most widely accepted at the sacrifice of 6 extra bytes per ID.

rbranson avatar Jun 07 '17 23:06 rbranson

However, the collision space is reduced to one millisecond. I'm not a math guru but it sounds like 80 bits is practically enough.

Here is some math.

Based on wiki, here's a script for calculating the collision probability.

>>> birthday_probability(10**9, 2**80)
4.135902207380582e-07

I believe 80 bit of randomness if enough if you are not generating more than a gig of IDs in one millisecond.

thinxer avatar Jun 08 '17 05:06 thinxer

Thanks Rick, that's a useful breakdown and I think would perhaps make a good addition to the README for future wanderings of people evaluating different solutions.

nicpottier avatar Jun 08 '17 15:06 nicpottier

@thinxer the 128-bits of random data is used to avoid dependency on the wall-clock for collision safety. If one is concerned about the extra 4 bytes of data or if millisecond precision is needed, then ULID is probably a better choice.

rbranson avatar Jun 12 '17 16:06 rbranson

I believe 80 bit of randomness if enough if you are not generating more than a gig of IDs in one millisecond.

In theory yes but if two computers are on the same millisecond tick, they are more likely to generate the same random sequence.

Of course this is implementation dependent, but I’m assuming this being used to generate ids on untrusted clients

victornpb avatar Mar 03 '19 12:03 victornpb

One point to add is that ksuid timestamps are not just lower precision, but also smaller future range:

  • ksuid: "The timestamp epoch is adjusted to March 5th, 2014, providing over 100 years of useful life" (specifically, 2^32sec ~ 136 years)
  • ULID: "Won't run out of space till the year 10889 AD".

cben avatar Apr 24 '19 10:04 cben

Another difference is encoding:

  • ULID specifies base32 encoding, friendlier to humans "reading on the phone" (avoids similar chars like 1IiJj, case insensitive)
  • ksuid uses denser base62, resulting in almost same string length.

cben avatar Apr 24 '19 10:04 cben

@thinxer the 128-bits of random data is used to avoid dependency on the wall-clock for collision safety. If one is concerned about the extra 4 bytes of data or if millisecond precision is needed, then ULID is probably a better choice.

Am I correct that this library will not generate a millisecond level of precision timestamp, though it uses one to generate its randomness? Hence, I would not be able to get the timestamp down to a millisecond when extracting timestamp data?

mlevkov avatar Jul 05 '19 03:07 mlevkov

That is correct, only 1 second of precision is offered by the timestamp embedded in KSUIDs.

achille-roussel avatar Jul 05 '19 06:07 achille-roussel

How to make sortable KSUID Base62

netxixi avatar Mar 10 '21 21:03 netxixi

I'm trying to summarize the differences regarding the variant bits:

The 122 random bits used in UUIDv4 has been very widely deployed and battle-tested in production systems for over a decade.

UUIDv4 has (all random) 122 variant bits. ULID has (48 timestamp till-10889AD-millisecond-precision + 80 random) 128 variant bits. KSUID has (32 timestamp till-2150AD-seconds-precision + 128 random) 160 variant bits. I might be debatable if only the random bits should count towards the "collision safe" variant part. rbranson already answered KSUID's point of view on that:

the 128-bits of random data is used to avoid dependency on the wall-clock for collision safety.

I understand this makes sense for systems where the entropy for the pseudo-random numbers generator depends on the system clock, since this dependency might yield redundancy between the timestamp bits and the random bits. Then, it makes sense to also choose to be free from the dependency or not between the system clock and the system pseudo-random numbers generator.

EDIT: The A brief history of the UUID article by @rbranson (a great read!) also mentions:

The timestamp provides 1-second resolution, which we found to be acceptable for a broad range of use cases. If a higher resolution timestamp is desired, payload bits can be traded for more timestamp bits. While high-resolution timestamp support is not included in our implementation, it is backwards compatible. Any implementation which uses 32-bit timestamps can safely work with KSUIDs that use higher resolution timestamps.

ericbn avatar May 31 '23 18:05 ericbn