uuid icon indicating copy to clipboard operation
uuid copied to clipboard

The real structure, algorithms and advantages of UUIDv7

Open sergeyprokhorenko opened this issue 6 months ago • 2 comments

The ramsey/uuid User Guide gives a too simple view of UUIDv7. The documentation is often very brief and lacks detail (compare). Because of this, many people on the Internet make guesses, which are often wrong or biased. Therefore, it is helpful to explain the details of UUIDv7’s structure and algorithms, as well as the intentions of its developers.


UUIDv7 is a direct descendant of ULID. Like ULID, it is a 128-bit identifier, containing a timestamp on the left side and random data on the right side. In databases and distributed systems, a properly implemented UUIDv7 is always preferred over any other identifier type, including natural keys, autoincrement, UUIDv4, TypeID, ULID, KSUID, CUID, NanoID, and Snowflake ID.

There are notable distinctions of UUIDv7:

  • RFC 9562 establishes numerous reasonable requirements for UUIDv7 in its various sections, such as mandatory usage of a cryptographically secure pseudorandom number generator (CSPRNG). However, you need to ensure that the implementation you use satisfies these requirements in your target deployment environment
  • There are various field and bit layout options in different implementations, designed to provide monotonicity (increasing order of generation) within a millisecond, including when the system clock is rolled back or when the rate of generation in parallel processes is high. For example, UUIDv7 may contain a sub-millisecond timestamp segment; a randomly initialized counter (with left bit optionally zeroized); or timestamp-based counter under high-load conditions. Minor violations of monotonicity do not affect performance but may complicate debugging, pagination, log searching, and the use of time series in databases. Also, a mutex may be used in a DBMS for monotonicity. Server-side generation ensures better monotonicity compared to client-side generation
  • Extracting and using the timestamp from UUIDv7 is not recommended. Consider explicit timestamp columns
  • The option of shifting timestamp value by any technically feasible range allows you to hide the true date of record creation, prevents lock contention during parallel generation of UUIDs in several processes, and also ensures monotonicity while generation on remote clients
  • UUIDv7 uses the widely compatible "hex-and-dash" string format, but RFC 9562 does not prohibit UUIDv7 from using the more readable, compact, and easily copied Crockford's Base32, as in ULID. UUIDv7 should be stored in 128-bit binary format in databases whenever possible. It reduces storage overhead by 30-40% compared to string representations, while maintaining index efficiency
  • Identifiers containing UUIDs used as keys may be right-extended beyond 128 bits with any metadata and checksum UUIDv7 contains a 4-bit version (7) field (0b0111) and a 2-bit variant field (0b10)

UUIDv7 and bigint demonstrate equivalent search and write performance confirmed by benchmarks. The rate of UUIDv7 generation is always sufficient and does not affect the performance of databases.

Although UUIDv7 is twice as long as a bigint, the difference in actual disk space used is much smaller. Besides, using UUID instead of bigint eliminates the need for intermediate tables and data layers.

UUIDv7, with the same performance, eliminates the following shortcomings of auto-increment:

  • The need to generate new keys and synchronize them with keys from the data source when exporting and importing data, or when generating records in parallel by multiple processes
  • Difficulty preventing key collisions when merging data from different database tables
  • Possible errors due to key collisions when merging data
  • Disclosure of the total number of records in a database table
  • Ease of brute-forcing valid keys
  • Impossibility of global search

sergeyprokhorenko avatar Jun 10 '25 18:06 sergeyprokhorenko

Without seeing a diff between what's currently in the docs and this, it's difficult to tell how you're suggesting this should fit into the docs. It's a lot of text to read, when most folks are primarily interested in finding out how to use the library quickly, so they can get on with their task and move on to the next.

I'm interested in ways to improve the documentation, but I want to keep the exposition short, so users can get straight to the details on what code they need to write.

Do you have any recommendations for how to improve the UUID version 7 docs and still keep it short? If possible, please make a pull request, so it's easier to see how you think the docs should change.

Thanks!

ramsey avatar Jun 14 '25 02:06 ramsey

It's easier to completely replace the existing text in the documentation about UUIDv7 with my text.

There are already a lot of short incorrect texts about UUIDv7 on the Internet. I participated in the development of RFC 9562 from the very beginning, and now I'm shocked to what extent information about UUIDv7 is now being distorted.

That's why I don't think another short contentless text is useful. If you compare the length of my text with the length of RFC 9562, dedicated almost only to UUIDv7, you will see that my text is extremely concise.

It even lacks information that you can use UUIDv7 in the same column where other versions of UUID or ULID in binary format are already used.

sergeyprokhorenko avatar Jun 14 '25 07:06 sergeyprokhorenko

It's easier to completely replace the existing text in the documentation about UUIDv7 with my text.

Which text, specifically? Can you make a pull request with your recommendations, so that it's easier to discuss what you want to see changed?

ramsey avatar Jun 18 '25 15:06 ramsey

I've dropped this issue because it's very general, while the guide is specific to one implementation.

sergeyprokhorenko avatar Jun 21 '25 10:06 sergeyprokhorenko

I wasn't disagreeing with your suggestions, but it wasn't clear to me where this text was supposed to fit in with the existing documentation, which is why I was asking for a pull request. A pull request would be easier to review.

ramsey avatar Jun 22 '25 02:06 ramsey