uuid
uuid copied to clipboard
UUIDv6–v8 Support
Is your feature request related to a problem? Please describe.
There is a new draft RFC for UUID formats that are more database friendly, it would be nice if these were supported in the uuid crate, since the new formats share a lot of the same internals.
Describe the solution you'd like An implementation of UUIDv6, v7, and v8 from the IETF RFC.
Is it blocking? No, it would just be cool to have.
Looks like I've got some reading to do :eyes: These are basically ULIDs are they?
I hadn't seen ULID before, but looking at the project it seems very similar, UUID has more variety in what kind of timestamp you can use.
Today I ran across a project that collects all UUIDv6 prototypes. It lacks Rust implementation :) If you add UUIDv6 support to your library, it would be great
https://github.com/uuid6/prototypes/
GitHub
Draft Prototypes and Tests for UUIDv6 and beyond. Contribute to uuid6/prototypes development by creating an account on GitHub.
I hadn't seen ULID before, but looking at the project it seems very similar, UUID has more variety in what kind of timestamp you can use.
Looks like I've got some reading to do 👀 These are basically ULIDs are they?
As I understand, UUIDv6 can be sorted and the sorted list would be in the generated time order that make UUIDv6 much more desirable for using as primary key or alike in databases
Thanks @XAMPPRocky and @VladimirMarkelov. I'll spend some time unpicking the spec to see if it would make sense to share our v1 timestamp APIs with v6 or whether there are changes we should make to unify them.
Ok, based on a really quick reading it looks like our Timestamp and ClockSequence should already be suitable at least for v6 and v7 UUIDs. It already can convert between the original Gregorian format and a standard Unix epoch. I haven't looked at v8 just yet.
Do we have any experience so far from similar implementations on what a good ratio of subsecond precision / clock sequence / random data we should target for v7 UUIDs? We already have access to 32bits (possibly low precision though) of subsecond data and 16bits for a clock sequence. If we used all that I think it would leave us with 38bits for random data, which seems a little low.
This will be great to have!
Do we have any experience so far from similar implementations on what a good ratio of subsecond precision / clock sequence / random data we should target for v7 UUIDs?
Would it be possible to make it user-configurable?
Hey @coreh :wave: Are you currently looking at using these new UUID variants? If so do you have any constraints or experiences to share?
We could look at making this configuration for sure, but whether or not we do that unconditionally (in something like Uuid::new_v7) or as an advanced API (in the Builder) would depend on how important they are to tweak. It might also be possible to set a precision on the clock sequence too so it tells the UUID how many bits it wants.
Yeah, I'm currently looking at using them for a hobby project (very early, in concept stage). My use case is to have a bunch of (untrusted and sandboxed) WASM applications running in a mesh network of hosts. Resources are identified by UUIDs, "owned" by the different applications, and are shared and accessible (via message passing) throughout the entire network. Applications can also be "migrated" between hosts.
My idea is to use the the highest amount of bits possible from the subsec_seq_node portion to uniquely identify applications for message routing (so that nodes can query other nodes for who's running that application) and to avoid giving the ability for applications to "bruteforce" or "scan" for the presence of other applications in the mesh network. I suspect most apps will create very few resources (say < 100,000) so this arrangement will probably work fine. However, I also wanted to keep the door open for some resource-heavy apps (that might need to create millions or billions of resources, quickly) to request a different allocation scheme with a shorter node portion in subsec_seq_node and more time/pseudorandom bits.
I also suspect (but I'm not entirely sure) the big endian, lexicographical sorting aspect of UUIDv7 will help with compression over the wire, especially when serializing/deserializing resource state when applications are migrated between nodes.
Ah I see 🤔 I was hoping we could use the same clock infrastructure we have for v1 UUIDs for this too (we already expose APIs for working with Unix epoch dates). Maybe they could determine the precision somehow so we can fill the rest with random data.
I saw the RFC got pushed back to next year. I don’t think I’ve ever looked at an IETF RFC that was in draft before so am not sure how normal that is 🙂
I saw the RFC got pushed back to next year. I don’t think I’ve ever looked at an IETF RFC that was in draft before so am not sure how normal that is
FWIW I've not been involved with IETF, but just from my experience implementing the RFCs, there's plenty of tech in production that lives in the draft or proposed RFC space for a long time, it more represents how much change there could be. For example; HTTP/2 (RFC 7540) and QUIC (RFC 9000) are still considered proposed standards by the IETF despite everyone using them.
The new UUID formats are being discussed in this github repository: https://github.com/uuid6/uuid6-ietf-draft
GitHub
UUID version 6 IETF draft. Contribute to uuid6/uuid6-ietf-draft development by creating an account on GitHub.
HTTP/2 (RFC 7540) and QUIC (RFC 9000) are still considered proposed standards by the IETF despite everyone using them.
Wow! Ok there you go.
The new UUID formats are being discussed in this github repository
Ah thanks @fabiolimace! :eyes:
I'll sketch this out on a branch and we can decide whether or not we want to ship them under a v6-draft / v7-draft / v8-draft feature so people can use them, with the caveat that they may change based on the draft spec.
So we currently have access to 96bits of timestamp (64bits of timestamp and 32bits of nanos) Timestamp::from_unix. I'm thinking if it's possible we should flip the internal representation of Timestamp to keep all that information (so it'll become 16bits bigger), and then introduce a TimestampPrecision type that's added to ClockSequence and can be used to truncate a timestamp.
Maybe something like:
pub struct Precision(TimestampPrecision, SequencePrecision);
pub struct TimestampPrecision(u8);
impl TimestampPrecision {
pub const SECONDS: Self = Self(0);
pub const MILLIS: Self = Self(3);
..
pub fn from_places(places: u8) -> Self { .. }
}
pub struct SequencePrecision(u8);
impl SequencePrecision {
pub const V1: Self = Self(14);
pub fn from_bits(bits: u8) -> Self { .. }
}
pub trait ClockSequence {
fn precision(&self) -> Precision { Precision(TimestampPrecision::MILLIS, SequencePrecision::V1) }
fn generate_sequence(&self, seconds: u64, subsec_nanos: u32) -> u16;
}
impl Uuid {
// Without precision we'll assume 96(?)bits of timestamp and node are for the timestamp
// This can produce some jittery results beyond the actual precision of the timestamp stored
fn to_timestamp_with_precision(&self, precision: Precision) -> Timestamp { .. }
}
impl Uuid {
// Any truncated portion of the timestamp to the end of the UUID will be filled with random data
// The precision used to generate the timestamp will determine when it truncates
pub fn new_v7(ts: Timestamp) -> Result<Self, crate::Error> { .. }
}
The idea being we can do this without any breakage to v1 but without also introducing a lot of new very similar infrastructure for v7 and v8.
What do you all think?
I see the reference Python implementation has these defaults for v7:
sec_bits = 36 # unixts at second precision
subsec_bits = 30 # Enough to represent NS
sequence_bits = 8 # Enough for 256 UUIDs per NS
We could probably use the same.
We're working towards a 1.0 release of uuid now. I'm confident we'll be able to support v6-v8 UUIDs without any disruptive breakage so I don't think this needs to block that.
I think the sketch I made above is a bit unnecessary. All we really need is a way to retain all the precision in Timestamp for a Unix timestamp, which we can do.
Is anyone working on this ? I could make a PR for v7.
Hi @malobre :wave: There isn't any active work on this right now. I haven't been following the RFC too closely, but was under the impression that things were changing in it so figured I'd check back in with it later.
If you'd like to work on some v7 support we can work out how to ship it, either in uuid as an unstable API, or externally, so we can keep up with the RFC as it evolves if necessary.
I have forked this repo and started to play with things on the v7 branch.
I will be putting any public facing additions behind the v7 feature and uuid_unstable cfg.
Working with v1::Timestamp is a little awkward for UUIDv7, i.e: we want milliseconds since unix epoch, but the public api (e.g: Timestamp::from_unix taking seconds & nanos instead of ms) and internal representation are tailored for UUIDv1 (number of 100-nanosecond intervals since gregorian epoch).
It makes sense to share the struct between v1 & v6, but for v7 not so much.
Should I define a separate v7::Timestamp struct or should I try to make it work with the current v1::Timestamp ?
Maybe this is something to take into account
Super excited to have uuid v7 support. @malobre does your fork currently work?
@malobre does your fork currently work?
No, I only made some groundwork as I intended to clarify some things beforehand.
In the meantime, here's a snippet for single-node, non-batched, UUIDv7 generation:
let uuid = {
use std::time::{SystemTime, UNIX_EPOCH};
let mut buf = rand::random::<u128>() & 0xFFF3FFFFFFFFFFFFFFF;
// 48 bits unix timestamp in ms
buf |= SystemTime::now()
.duration_since(UNIX_EPOCH)
.expect("SystemTime before UNIX Epoch")
.as_millis() << 80;
// version
buf |= 0x7 << 76;
// variant
buf |= 0b10 << 62;
uuid::Uuid::from_u128(buf)
};
Edit: I thought I should really emphasize that you should NOT use the above snippet if you intend to generate UUIDs in a distributed fashion or in batches.
FYI, I stumbled upon a rust implementation: https://github.com/LiosK/uuid7-rs
I believe the main challenge will be integrating UUIDv7 with the existing Timestamp struct rather than the implementation itself.
The Timestamp type is pretty opaque, so I think we should be able to come up with a scheme that can support the various shaped timestamps in the new formats. If need be we could also come up with an alternative type and provide conversions between them.
FYI, I am not currently able to spend much time on this (or open-source in general). If someone want to take a stab at this, please do :-)
@KodrAus - Regarding your sketch for *Precision, it seems to me that the only Uuid type that needs to support an arbitrary precision for time and sequence is v8, and I have doubts about that.
It seems to me that introducing Time and Sequence Precisions introduce a fair amount of complexity for only 2 choices for precision. V6 is already implemented, and V7 is always fixed at milliseconds, so there isn't much of a reason to supply a variable-precision timestamp. In addition, I don't think we can manage construction generically for the different uuid types, consdering that v1 and v6 use the funky ticks offset.
For v8, to be spec-compliant, we don't actually need to provide any functionality for v8 other than accept a 16 byte buffer that we distribute into Bytes at the correct offsets.
I started implementing V8 helper types (e.g. construct a new v8 Uuid from now()) but, frankly, there are an awful lot of variables that would be hard to account for:
- Distributed Differentiator: Most Uuids will still want a host designation and/or randomness. In order to ensure that a Uuid is truly unique across a distributed system, you need some differentiator other than clock + sequence.
- When to start the Epoch? Why start IDs at seconds since Unix Epoch, since we'd be dedicating 32 bits of information to a number range that we'll never use. We could start the epoch on Jan 1 of 2022, provide 32 bits, and have capacity for the next 120 years.
- How to distribute the rest of the things?
I guess my point is that V8 is left blank for a reason, and if we took a guess at what people wanted, we'd probably be overlapping quite a bit with the v7 use case.
So I'm proposing that we...
- Change Timestamp per your suggestion, make it store 64bit secs and 32bit nanos (sequence is irrelevent, yes?)
- (Re)implement v1, v6, and v7 to apply their Version-specific things to the supplied timestamp upon construction.
- Create a v8 whose only constructor is basically
from_buffer([u8; 16])and maybefrom_fields(u64, u16, u64)and leave the rest up to the user.
If you're cool with this, I'm happy to implement it.
Then, in the future..
If we want to add a really fancy UuidV8Builder which can take all of the above into consideration (e.g.
let foo = Builder::new()
.set_random_bits()
.set_seconds_bits()
.set_frac_bits()
.set_frac_divisor()
.set_hostname()
.ignore_unused_bits()
.build();
Oh, also, the v7 spec sort of punts on the counter/sequent and how to do monotonicity. Its says that 1000 Uuids created in a batch should be sorted in order. It doesn't say how, but it does offer suggestions.
I think the best way would be to apply a 10 bit (or so) counter to most significant bits of the random number portion of the random bits. So instead of 74 bits of randomness, it'd be 64. (or to fit more neatly into the bit segments.. 12 and 62)
After implementing the changes I mentioned above. The API will change because I've separated Counter from Timestamp (which I think is the correct approach). A Timestamp shouldn't know anything about a Counter, but a counter could/should know about a Timestamp.
Also there are some legacy things in here, such as being edition 2018 instead of 2021. Also, it uses the atomic crate but the reason for it is no longer valid, since std::sync::atomic now has those types that we need.
Also, std::time is standardized, so we can add helper functions based on SystemTime (not that this would be a breaking change, we would just add new methods)
IMO, we should bump the version to 2.0 and modernize things a bit more.
Thoughts?