uuid icon indicating copy to clipboard operation
uuid copied to clipboard

UUIDv6–v8 Support

Open XAMPPRocky opened this issue 4 years ago • 31 comments

Is your feature request related to a problem? Please describe. There is a new draft RFC for UUID formats that are more database friendly, it would be nice if these were supported in the uuid crate, since the new formats share a lot of the same internals.

Describe the solution you'd like An implementation of UUIDv6, v7, and v8 from the IETF RFC.

Is it blocking? No, it would just be cool to have.

XAMPPRocky avatar Aug 07 '21 05:08 XAMPPRocky

Looks like I've got some reading to do :eyes: These are basically ULIDs are they?

KodrAus avatar Aug 12 '21 22:08 KodrAus

I hadn't seen ULID before, but looking at the project it seems very similar, UUID has more variety in what kind of timestamp you can use.

XAMPPRocky avatar Aug 12 '21 23:08 XAMPPRocky

Today I ran across a project that collects all UUIDv6 prototypes. It lacks Rust implementation :) If you add UUIDv6 support to your library, it would be great

https://github.com/uuid6/prototypes/

GitHub
Draft Prototypes and Tests for UUIDv6 and beyond. Contribute to uuid6/prototypes development by creating an account on GitHub.

VladimirMarkelov avatar Aug 16 '21 04:08 VladimirMarkelov

I hadn't seen ULID before, but looking at the project it seems very similar, UUID has more variety in what kind of timestamp you can use.

Looks like I've got some reading to do 👀 These are basically ULIDs are they?

As I understand, UUIDv6 can be sorted and the sorted list would be in the generated time order that make UUIDv6 much more desirable for using as primary key or alike in databases

VladimirMarkelov avatar Aug 16 '21 04:08 VladimirMarkelov

Thanks @XAMPPRocky and @VladimirMarkelov. I'll spend some time unpicking the spec to see if it would make sense to share our v1 timestamp APIs with v6 or whether there are changes we should make to unify them.

KodrAus avatar Aug 16 '21 04:08 KodrAus

Ok, based on a really quick reading it looks like our Timestamp and ClockSequence should already be suitable at least for v6 and v7 UUIDs. It already can convert between the original Gregorian format and a standard Unix epoch. I haven't looked at v8 just yet.

KodrAus avatar Aug 16 '21 04:08 KodrAus

Do we have any experience so far from similar implementations on what a good ratio of subsecond precision / clock sequence / random data we should target for v7 UUIDs? We already have access to 32bits (possibly low precision though) of subsecond data and 16bits for a clock sequence. If we used all that I think it would leave us with 38bits for random data, which seems a little low.

KodrAus avatar Aug 16 '21 05:08 KodrAus

This will be great to have!

ms-ati avatar Sep 12 '21 18:09 ms-ati

Do we have any experience so far from similar implementations on what a good ratio of subsecond precision / clock sequence / random data we should target for v7 UUIDs?

Would it be possible to make it user-configurable?

coreh avatar Sep 28 '21 22:09 coreh

Hey @coreh :wave: Are you currently looking at using these new UUID variants? If so do you have any constraints or experiences to share?

We could look at making this configuration for sure, but whether or not we do that unconditionally (in something like Uuid::new_v7) or as an advanced API (in the Builder) would depend on how important they are to tweak. It might also be possible to set a precision on the clock sequence too so it tells the UUID how many bits it wants.

KodrAus avatar Sep 30 '21 06:09 KodrAus

Yeah, I'm currently looking at using them for a hobby project (very early, in concept stage). My use case is to have a bunch of (untrusted and sandboxed) WASM applications running in a mesh network of hosts. Resources are identified by UUIDs, "owned" by the different applications, and are shared and accessible (via message passing) throughout the entire network. Applications can also be "migrated" between hosts.

My idea is to use the the highest amount of bits possible from the subsec_seq_node portion to uniquely identify applications for message routing (so that nodes can query other nodes for who's running that application) and to avoid giving the ability for applications to "bruteforce" or "scan" for the presence of other applications in the mesh network. I suspect most apps will create very few resources (say < 100,000) so this arrangement will probably work fine. However, I also wanted to keep the door open for some resource-heavy apps (that might need to create millions or billions of resources, quickly) to request a different allocation scheme with a shorter node portion in subsec_seq_node and more time/pseudorandom bits.

I also suspect (but I'm not entirely sure) the big endian, lexicographical sorting aspect of UUIDv7 will help with compression over the wire, especially when serializing/deserializing resource state when applications are migrated between nodes.

coreh avatar Oct 02 '21 19:10 coreh

Ah I see 🤔 I was hoping we could use the same clock infrastructure we have for v1 UUIDs for this too (we already expose APIs for working with Unix epoch dates). Maybe they could determine the precision somehow so we can fill the rest with random data.

I saw the RFC got pushed back to next year. I don’t think I’ve ever looked at an IETF RFC that was in draft before so am not sure how normal that is 🙂

KodrAus avatar Oct 27 '21 10:10 KodrAus

I saw the RFC got pushed back to next year. I don’t think I’ve ever looked at an IETF RFC that was in draft before so am not sure how normal that is

FWIW I've not been involved with IETF, but just from my experience implementing the RFCs, there's plenty of tech in production that lives in the draft or proposed RFC space for a long time, it more represents how much change there could be. For example; HTTP/2 (RFC 7540) and QUIC (RFC 9000) are still considered proposed standards by the IETF despite everyone using them.

XAMPPRocky avatar Oct 27 '21 10:10 XAMPPRocky

The new UUID formats are being discussed in this github repository: https://github.com/uuid6/uuid6-ietf-draft

GitHub
UUID version 6 IETF draft. Contribute to uuid6/uuid6-ietf-draft development by creating an account on GitHub.

fabiolimace avatar Oct 28 '21 01:10 fabiolimace

HTTP/2 (RFC 7540) and QUIC (RFC 9000) are still considered proposed standards by the IETF despite everyone using them.

Wow! Ok there you go.

The new UUID formats are being discussed in this github repository

Ah thanks @fabiolimace! :eyes:

I'll sketch this out on a branch and we can decide whether or not we want to ship them under a v6-draft / v7-draft / v8-draft feature so people can use them, with the caveat that they may change based on the draft spec.

KodrAus avatar Oct 28 '21 03:10 KodrAus

So we currently have access to 96bits of timestamp (64bits of timestamp and 32bits of nanos) Timestamp::from_unix. I'm thinking if it's possible we should flip the internal representation of Timestamp to keep all that information (so it'll become 16bits bigger), and then introduce a TimestampPrecision type that's added to ClockSequence and can be used to truncate a timestamp.

Maybe something like:

pub struct Precision(TimestampPrecision, SequencePrecision);

pub struct TimestampPrecision(u8);

impl TimestampPrecision {
    pub const SECONDS: Self = Self(0);
    pub const MILLIS: Self = Self(3);
    ..

    pub fn from_places(places: u8) -> Self { .. }
}

pub struct SequencePrecision(u8);

impl SequencePrecision {
    pub const V1: Self = Self(14);

    pub fn from_bits(bits: u8) -> Self { .. }
}

pub trait ClockSequence {
    fn precision(&self) -> Precision { Precision(TimestampPrecision::MILLIS, SequencePrecision::V1) }

    fn generate_sequence(&self, seconds: u64, subsec_nanos: u32) -> u16;
}

impl Uuid {
    // Without precision we'll assume 96(?)bits of timestamp and node are for the timestamp
    // This can produce some jittery results beyond the actual precision of the timestamp stored
    fn to_timestamp_with_precision(&self, precision: Precision) -> Timestamp { .. }
}

impl Uuid {
    // Any truncated portion of the timestamp to the end of the UUID will be filled with random data
    // The precision used to generate the timestamp will determine when it truncates
    pub fn new_v7(ts: Timestamp) -> Result<Self, crate::Error> { .. }
}

The idea being we can do this without any breakage to v1 but without also introducing a lot of new very similar infrastructure for v7 and v8.

What do you all think?

KodrAus avatar Oct 28 '21 04:10 KodrAus

I see the reference Python implementation has these defaults for v7:

sec_bits = 36 # unixts at second precision
subsec_bits = 30 # Enough to represent NS
sequence_bits = 8 # Enough for 256 UUIDs per NS

We could probably use the same.

KodrAus avatar Oct 29 '21 05:10 KodrAus

We're working towards a 1.0 release of uuid now. I'm confident we'll be able to support v6-v8 UUIDs without any disruptive breakage so I don't think this needs to block that.

I think the sketch I made above is a bit unnecessary. All we really need is a way to retain all the precision in Timestamp for a Unix timestamp, which we can do.

KodrAus avatar Nov 01 '21 22:11 KodrAus

Is anyone working on this ? I could make a PR for v7.

malobre avatar May 13 '22 12:05 malobre

Hi @malobre :wave: There isn't any active work on this right now. I haven't been following the RFC too closely, but was under the impression that things were changing in it so figured I'd check back in with it later.

If you'd like to work on some v7 support we can work out how to ship it, either in uuid as an unstable API, or externally, so we can keep up with the RFC as it evolves if necessary.

KodrAus avatar May 25 '22 01:05 KodrAus

I have forked this repo and started to play with things on the v7 branch.

I will be putting any public facing additions behind the v7 feature and uuid_unstable cfg.

Working with v1::Timestamp is a little awkward for UUIDv7, i.e: we want milliseconds since unix epoch, but the public api (e.g: Timestamp::from_unix taking seconds & nanos instead of ms) and internal representation are tailored for UUIDv1 (number of 100-nanosecond intervals since gregorian epoch).

It makes sense to share the struct between v1 & v6, but for v7 not so much. Should I define a separate v7::Timestamp struct or should I try to make it work with the current v1::Timestamp ?

malobre avatar May 25 '22 12:05 malobre

Maybe this is something to take into account

JFortun avatar Jun 13 '22 06:06 JFortun

Super excited to have uuid v7 support. @malobre does your fork currently work?

benwis avatar Jun 13 '22 22:06 benwis

@malobre does your fork currently work?

No, I only made some groundwork as I intended to clarify some things beforehand.

In the meantime, here's a snippet for single-node, non-batched, UUIDv7 generation:

let uuid = {
    use std::time::{SystemTime, UNIX_EPOCH};

    let mut buf = rand::random::<u128>() & 0xFFF3FFFFFFFFFFFFFFF;

    // 48 bits unix timestamp in ms
    buf |= SystemTime::now()
        .duration_since(UNIX_EPOCH)
        .expect("SystemTime before UNIX Epoch")
        .as_millis() << 80;

    // version
    buf |= 0x7 << 76;

    // variant
    buf |= 0b10 << 62;
    
    uuid::Uuid::from_u128(buf)
};

Edit: I thought I should really emphasize that you should NOT use the above snippet if you intend to generate UUIDs in a distributed fashion or in batches.

malobre avatar Jun 15 '22 17:06 malobre

FYI, I stumbled upon a rust implementation: https://github.com/LiosK/uuid7-rs

I believe the main challenge will be integrating UUIDv7 with the existing Timestamp struct rather than the implementation itself.

malobre avatar Jun 17 '22 11:06 malobre

The Timestamp type is pretty opaque, so I think we should be able to come up with a scheme that can support the various shaped timestamps in the new formats. If need be we could also come up with an alternative type and provide conversions between them.

KodrAus avatar Jul 03 '22 09:07 KodrAus

FYI, I am not currently able to spend much time on this (or open-source in general). If someone want to take a stab at this, please do :-)

malobre avatar Aug 04 '22 14:08 malobre

@KodrAus - Regarding your sketch for *Precision, it seems to me that the only Uuid type that needs to support an arbitrary precision for time and sequence is v8, and I have doubts about that.

It seems to me that introducing Time and Sequence Precisions introduce a fair amount of complexity for only 2 choices for precision. V6 is already implemented, and V7 is always fixed at milliseconds, so there isn't much of a reason to supply a variable-precision timestamp. In addition, I don't think we can manage construction generically for the different uuid types, consdering that v1 and v6 use the funky ticks offset.

For v8, to be spec-compliant, we don't actually need to provide any functionality for v8 other than accept a 16 byte buffer that we distribute into Bytes at the correct offsets.

I started implementing V8 helper types (e.g. construct a new v8 Uuid from now()) but, frankly, there are an awful lot of variables that would be hard to account for:

  1. Distributed Differentiator: Most Uuids will still want a host designation and/or randomness. In order to ensure that a Uuid is truly unique across a distributed system, you need some differentiator other than clock + sequence.
  2. When to start the Epoch? Why start IDs at seconds since Unix Epoch, since we'd be dedicating 32 bits of information to a number range that we'll never use. We could start the epoch on Jan 1 of 2022, provide 32 bits, and have capacity for the next 120 years.
  3. How to distribute the rest of the things?

I guess my point is that V8 is left blank for a reason, and if we took a guess at what people wanted, we'd probably be overlapping quite a bit with the v7 use case.

So I'm proposing that we...

  1. Change Timestamp per your suggestion, make it store 64bit secs and 32bit nanos (sequence is irrelevent, yes?)
  2. (Re)implement v1, v6, and v7 to apply their Version-specific things to the supplied timestamp upon construction.
  3. Create a v8 whose only constructor is basically from_buffer([u8; 16]) and maybe from_fields(u64, u16, u64) and leave the rest up to the user.

If you're cool with this, I'm happy to implement it.

Then, in the future..

If we want to add a really fancy UuidV8Builder which can take all of the above into consideration (e.g.

   let foo = Builder::new()
   .set_random_bits()
   .set_seconds_bits()
   .set_frac_bits()
   .set_frac_divisor()
   .set_hostname()
   .ignore_unused_bits()
   .build();

rrichardson avatar Aug 15 '22 23:08 rrichardson

Oh, also, the v7 spec sort of punts on the counter/sequent and how to do monotonicity. Its says that 1000 Uuids created in a batch should be sorted in order. It doesn't say how, but it does offer suggestions.
I think the best way would be to apply a 10 bit (or so) counter to most significant bits of the random number portion of the random bits. So instead of 74 bits of randomness, it'd be 64. (or to fit more neatly into the bit segments.. 12 and 62)

rrichardson avatar Aug 16 '22 00:08 rrichardson

After implementing the changes I mentioned above. The API will change because I've separated Counter from Timestamp (which I think is the correct approach). A Timestamp shouldn't know anything about a Counter, but a counter could/should know about a Timestamp.

Also there are some legacy things in here, such as being edition 2018 instead of 2021. Also, it uses the atomic crate but the reason for it is no longer valid, since std::sync::atomic now has those types that we need.

Also, std::time is standardized, so we can add helper functions based on SystemTime (not that this would be a breaking change, we would just add new methods)

IMO, we should bump the version to 2.0 and modernize things a bit more.

Thoughts?

rrichardson avatar Aug 16 '22 17:08 rrichardson