8334015: Add Support for UUID Version 7 (UUIDv7) defined in RFC 9562
With the recent approval of UUIDv7 (https://datatracker.ietf.org/doc/rfc9562/), this PR aims to add a new static method UUID.timestampUUID() which constructs and returns a UUID in support of the new time generated UUID version.
The specification requires embedding the current timestamp in milliseconds into the first bits 0–47. The version number in bits 48–51, bits 52–63 are available for sub-millisecond precision or for pseudorandom data. The variant is set in bits 64–65. The remaining bits 66–127 are free to use for more pseudorandom data or to employ a counter based approach for increased time percision (https://www.rfc-editor.org/rfc/rfc9562.html#name-uuid-version-7).
The choice of implementation comes down to balancing the sensitivity level of being able to distingush UUIDs created below <1ms apart with performance. A test simulating a high-concurrency environment with 4 threads generating 10000 UUIDv7 values in parallel to measure the collision rate of each implementation (the amount of times the time based portion of the UUID was not unique and entries could not distinguished by time) yeilded the following results for each implemtation:
- random-byte-only - 99.8%
- higher-precision - 3.5%
- counter-based - 0%
Performance tests show a decrease in performance as expected with the counter based implementation due to the introduction of synchronization:
- random-byte-only 143.487 ± 10.932 ns/op
- higher-precision 149.651 ± 8.438 ns/op
- counter-based 245.036 ± 2.943 ns/op
The best balance here might be to employ a higher-precision implementation as the large increase in time sensitivity comes at a very slight performance cost.
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
- [ ] Change requires CSR request JDK-8357251 to be approved
Issues
- JDK-8334015: Add Support for UUID Version 7 (UUIDv7) defined in RFC 9562 (Enhancement - P4)
- JDK-8357251: Add Support for UUID Version 7 (UUIDv7) defined in RFC 9562 (CSR)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25303/head:pull/25303
$ git checkout pull/25303
Update a local copy of the PR:
$ git checkout pull/25303
$ git pull https://git.openjdk.org/jdk.git pull/25303/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25303
View PR using the GUI difftool:
$ git pr show -t 25303
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25303.diff
Using Webrev
:wave: Welcome back kieran-farrell! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@kieran-farrell This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8334015: Add Support for UUID Version 7 (UUIDv7) defined in RFC 9562
Reviewed-by: rriggs, jpai, alanb
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 1853 new commits pushed to the master branch:
- 50bb92a33b32778a96b1823ff995889892bef890: 8370871: [s390x] consistently update top_frame_sp
- 576f9694b092f2a11a6a4e5a82c2b0e12203bd9c: 8361106: [TEST] com/sun/net/httpserver/Test9.java fails with java.nio.file.FileSystemException
- dadbad0bef84f671c8194c84080c760453ecc423: 8371088: Build fails when trying hsdis option
- ... and 1850 more: https://git.openjdk.org/jdk/compare/984d7f9cdfb0d75ea906ce32df0b6c447f4d5954...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@RogerRiggs, @jaikiran, @AlanBateman) but any other Committer may sponsor as well.
➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).
@kieran-farrell The following label will be automatically applied to this pull request:
-
core-libs
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Webrevs
- 22: Full - Incremental (27613619)
- 21: Full - Incremental (1be228fc)
- 20: Full - Incremental (b2239d8c)
- 19: Full - Incremental (5ebfa8cc)
- 18: Full - Incremental (8bf7cf21)
- 17: Full - Incremental (ae7e2f27)
- 16: Full - Incremental (409bbbfd)
- 15: Full - Incremental (fa1d2b26)
- 14: Full - Incremental (22207bb0)
- 13: Full - Incremental (1d8cd4d0)
- 12: Full - Incremental (be7dea7a)
- 11: Full - Incremental (a72f1834)
- 10: Full - Incremental (c06474fc)
- 09: Full - Incremental (1e730c24)
- 08: Full - Incremental (1011035f)
- 07: Full - Incremental (adb50724)
- 06: Full (6c9255f6)
- 05: Full - Incremental (aefc4a84)
- 04: Full - Incremental (804187fd)
- 03: Full - Incremental (33d8ccbf)
- 02: Full - Incremental (c7efd528)
- 01: Full - Incremental (922869b3)
- 00: Full (b0b4f7a8)
Hi @BsoBird, thanks for making a comment in an OpenJDK project!
All comments and discussions in the OpenJDK Community must be made available under the OpenJDK Terms of Use. If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please Use "Add GitHub user BsoBird" for the summary.
If you are not an OpenJDK Author, Committer or Reviewer, simply check the box below to accept the OpenJDK Terms of Use for your comments.
- [ ] I agree to the OpenJDK Terms of Use for all comments I make in a project in the OpenJDK GitHub organization.
Your comment will be automatically restored once you have accepted the OpenJDK Terms of Use.
@BsoBird
The RFC states that 'Implementations SHOULD utilize a cryptographically secure pseudorandom number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique")."
That implementation uses ThreadLocalRandom which does not generate cryptographically secure randomness. For improved security and uniqueness of UUIDs, it might be better to use SecureRandom, also aligning with the behavoir of the randomUUID() method.
@kieran-farrell Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.
@kieran-farrell Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.
One minor update made to include URL with spec tag as required
+ * @spec https://www.rfc-editor.org/rfc/rfc9562.html
+ * RFC 9562 Universally Unique IDentifiers (UUIDs)
CSR body and fix version updated also.
Many thanks @RogerRiggs.
Set the CSR state to Proposed to begin the CSR review.
@kieran-farrell This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
/touch
@kieran-farrell The pull request is being re-evaluated and the inactivity timeout has been reset.
Hi All, Would it be possible to progress review with this?
Hello Kieran,
Hi All, Would it be possible to progress review with this?
I haven't been able to check and respond to your updates, sorry about that. I'll need at least a few more days to come back to this.
@jaikiran @rgiulietti Can you give this another look. its been lingering. Tnx.
I plan to look at this API proposal, just haven't had time yet.
Sorry Roger and Kieran, I had to keep this aside for a while. I am reviewing this afresh now.
An initial remark about the APIs being proposed in this PR. Reading through the motivation section of RFC-9562 https://www.rfc-editor.org/rfc/rfc9562.html#name-update-motivation, I think a few important things that we should consider for the API we are proposing are:
Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items ... In such cases, "auto-increment" schemes that are often used by databases do not work well: the effort required to coordinate sequential numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring coordination makes them a good alternative, but UUID versions 1-5, which were originally defined by [RFC4122], lack certain other desirable characteristics...
... Due to the aforementioned issues, many widely distributed database applications and large application vendors have sought to solve the problem of creating a better time-based, sortable unique identifier for use as a database key. This has led to numerous implementations over the past 10+ years solving the same problem in slightly different ways ...
Then later in section 6.1 and 6.2 https://www.rfc-editor.org/rfc/rfc9562.html#section-6.1 it's further stated that:
UUID timestamp source, precision, and length were topics of great debate while creating UUIDv7 for this specification. Choosing the right timestamp for your application is very important. ... Monotonicity (each subsequent value being greater than the last) is the backbone of time-based sortable UUIDs.
Given all this, I think the API we provide must try and achieve these primary motivations. That would then mean, not allowing arbitrary values to be passed by applications for generating a UUIDv7 UUID instance. So I think we shouldn't introduce the:
public static UUID epochMillis(long timestamp)
being proposed in this PR. The implementation of this method will have no control (unless we add some logic of keeping track of each call) over what "timestamp" gets passed for subsequent calls and thus cannot guarantee the generated UUIDv7 value to be monotonic. Of course, we could expect the applications to make sure they pass the right timestamp(s) for each call, but then that brings us back to what the RFC motivation stated - that several libraries do it differently. So I think having libraries/applications do the work of passing the right timestamp may not be an useful way to expose the UUIDv7 generation.
I think the other API being proposed in this PR:
public static UUID epochMillis()
is the only one we should introduce. I'm still reviewing the monotonicity implementation and discussion of this epochMillis() method in this PR and will reply separately on that.
Given all this, I think the API we provide must try and achieve these primary motivations. That would then mean, not allowing arbitrary values to be passed by applications for generating a UUIDv7 UUID instance.
To phrase that differently - should we introduce this/these API(s) in Java SE to allow for UUID instances just be structurally UUIDv7 type or should we also enforce other semantics (like monotonicity) expected of UUIDv7 type. If it's the latter, then I think we shouldn't let applications pass the timestamp value when constructing these UUID instances. My opinion is that providing a Java SE API to construct a UUID instance which is merely structurally UUIDv7 isn't too useful.
Given the full scope of monotonicity including across application and runtime restarts is a much larger requirement. The current API addition was introduced only to support UUID as a holder of a version 7 UUID. The API simplifies the construction of a V7 UUID but does not make the more comprehensive guarantees. I think that could/should be considered a separate initiative. It would be beneficial to layer that Java support for a fully supported V7 on top of support by the operating system function. I don't think it can be delivered without OS support.
The view that the application needs to take overall responsibility for the needed semantics could be reinforced by removing the convenience method epochMilliUUID() leaving it to the application to supply the value to be encoded as the milliseconds in the UUID.
Hello Roger,
The view that the application needs to take overall responsibility for the needed semantics could be reinforced by removing the convenience method
epochMilliUUID()leaving it to the application to supply the value to be encoded as the milliseconds in the UUID.
Do you mean getting rid of the epochMillis() (the no-arg one) method from this PR?
My initial thought was that just providing an API which constructs a UUIDv7 instance from a user provided value isn't too useful, but then reading your note about:
The API simplifies the construction of a V7 UUID but does not make the more comprehensive guarantees. I think that could/should be considered a separate initiative. It would be beneficial to layer that Java support for a fully supported V7 on top of support by the operating system function. I don't think it can be delivered without OS support.
I agree with what you are saying. The level of monotonicity I had in mind was per JVM lifetime, but like you note, it could be bigger:
Given the full scope of monotonicity including across application and runtime restarts is a much larger requirement.
If we drop the epochMillis() (the no-arg one) method from this PR (which I think is what you meant), then I think it's reasonable. With that, the timestamp computation that's being introduced in this PR will be gone completely. What we will end up with is just one new API which allows applications to pass a long value (representing a timestamp) which we use to construct a UUID instance that is structurally UUIDv7 version. With an appropriate javadoc for the method and maybe a @apiNote, I think we can convey that the implementation of this method isn't responsible for the additional semantics expected of a UUIDv7 instance.
By dropping the no-arg epochMillis() we avoid the concerns about how the epoch millis maintains the requirements for monoticity.
We can leave the computation of the epoch millis to the application. A obvious convenient value is from System.currentTimeMillis(). There's a risk that they will ignore lack of a guarantee of monoticity of that source and only occasionally suffer from it.
The current API note on public static UUID epochMillis(long timestamp) links to the RFC to cover the requirements of its argument.
It might be useful to add a cautionary sentence mentioning that System.currentTimeMillis() does not meet all of the requirements of the RFC; but that could turn in to a longer paragraph.
So Yes, we can reduce the functionality to be a carrier of a V7 UUID (in this PR) and separately consider the higher level semantics.
That sounds reasonable to me. Kieran @kieran-farrell does that look fine to you?
Hi @jaikiran, yes i'm happy with the above suggestions. I've updated the code accordingly, I can also add a note outlining the possible inaccuracies with using System.currentTimeMillis() if we feel it is required.
Adding support for UUID v7 also includes sorting correctly, IMO.
This has always been incorrect in the JDK as I see it, but back in the days of UUIDv1 to v4 nobody really cared that much how a UUID would sort. Enter UUID v7 and sorting is now important to get right.
So what is the problem? The existing UUIID.compareTo() method compares the two longs (nothing wrong with that), but those longs are SIGNED and what you need would be UNSIGNED comparison.
The problem was recognized years ago in JDK-7025832 but was rejected to change it due to concerns over backward compatibility.
The problem - when UUID v7 is introduced - is that it becomes apparent that the JDK does not sort the UUID in the same way as the database does or indeed any other language. Previously, this was less of a concern because there was less of reason to sort UUIDs.
To be specific, what you expect - and what both the old RFC-4122 spec and the newer RFC-9562 states in their own words - is that UUIDs should be lexicographically sorted, i.e. as if by comparing two arrays of bytes (len=16) where each byte is a value 0-255 ( as opposed to a value -128 to 127). An implementation could be:
public int compareToLexi(UUID val) {
int mostSigBits = Long.compareUnsigned(this.mostSigBits, val.mostSigBits);
return mostSigBits != 0 ? mostSigBits : Long.compareUnsigned(this.leastSigBits, val.leastSigBits);
}
This would be exactly equal to a method which compares byte arrays as described above.
I do not suggest to change the existing compareTo() logic. But at the very least this legacy problem should be highlighted somewhere in the Javadoc. Addressing this, at least with a comment, would be part of a proper UUIDv7 implementation.
My 2c.
Should the existing UUID.timestamp() be updated to also support v7 uuid? If not, should there be another, similar method to extract the timestamp from a v7 UUID?
Should the existing UUID.timestamp() be updated to also support v7 uuid? If not, should there be another, similar method to extract the timestamp from a v7 UUID?
The timestamp for a version 1 is in 100ns units from the start of the Gregorian calendar. Version 7 is millis since the Unix epoch in version 7. There is some back and forth on this when the PR was initially created. So while it may be confusing to use epochMillis and have timestamp throw UOE, it would be a hazard to do otherwise.
while it may be confusing to use epochMillis and have timestamp throw UOE, it would be a hazard to do otherwise.
+1
Should there be another method added to extract the epoch millis from a v7 uuid?