uuid6-ietf-draft icon indicating copy to clipboard operation
uuid6-ietf-draft copied to clipboard

Discussion: Redefine variant bit (111) definition

Open kyzer-davis opened this issue 3 years ago • 56 comments

Continuation of separate #24 thread

Question: Should we redefine UUID variant bits 111 (E/F) which are currently "Reserved for future definition." as per the original RFC 4122 Section 4.1.1 source?


Proposal:

  • In Draft 02 set definition of any UUID Version (UUIDv1/2/3/4/5/6/7/8) + Variant E (111) as a method for signaling an alternative bit layout to any previously defined UUID.
  • With that precedent set UUIDv6 could be converted to UUIDv1 (0001) + Variant E/F (111) as a method to signal the alternative encoding for UUIDv1 labeled as UUIDv1ε.

Possible text changes depending on feedback from this issue and #24

As such the Draft 01 goes from the current three definitions:

Name Version Variant Description
UUIDv6 0110 10x (8/9/A/B) UUIDv1 with Re-ordered Gregorian timestamp, explicit start sequence counter, no MAC address
UUIDv7 0111 10x (8/9/A/B) 36-bit Unix epoch timestamp, variable subsecond encoding up to nanoseconds using floating point math and fractions (38 bits allocated to subsecond precision).
UUIDv8 1000 10x (8/9/A/B) Relaxed implementation, any timestamp goes, future proof the specification, 122 bits to do as you desire with general guidelines of timestamp, sequence, random in that order

To the possible four definitions in Draft 02:

Name Version Variant Description
UUIDv1ε 0001 111 (E/F) UUIDv1 with Re-ordered Gregorian timestamp, explicit start sequence counter, no MAC address. (What is UUIDv6 in draft 01)
UUIDv6 0110 10x (8/9/A/B) 36-bit Unix epoch timestamp, variable subsecond encoding up to nanoseconds using floating point math and binary fractions (38 bits allocated to subsecond precision). (What is UUIDv7 in draft 01)
UUIDv6ε 0110 111 (E/F) UUIDv6 with 36-bit Unix epoch timestamp, variable subsecond encoding up to nanoseconds using integers to represent total number of subseconds (30 bits allocated to subsecond precision). (Did not exist in draft 01)
UUIDv7 0111 10x (8/9/A/B) Relaxed implementation, any timestamp goes, future proof the specification, 122 bits to do as you desire with general guidelines of timestamp, sequence, random in that order. (What was UUIDv8 in draft 01)
UUIDv8 1000 10x (8/9/A/B) Goes away in draft 02 as it is no longer required.

kyzer-davis avatar Aug 13 '21 14:08 kyzer-davis

Could you clarify for me what does "variable subsecond encoding up to nanoseconds using floating point math and fractions" and "variable subsecond encoding up to nanoseconds using integers to represent total number of subseconds" mean or what is the difference between them?

nerg4l avatar Aug 13 '21 17:08 nerg4l

@nerg4l, the topic being discussed in #24 if we kept the current UUIDv7 and just changed it to UUIDv6 (since UUIDv6 becomes UUIDv1ε). Then we can also add an alternative encoding that uses the 30-bit variant without any floating point math and binary fraction encoding as UUIDv6ε. We get the best of both words. UUIDv7 becomes what was UUIDv8 and I drop UUIDv8 from the draft.

I edited the table to add more clarity.

kyzer-davis avatar Aug 13 '21 17:08 kyzer-davis

@kyzer-davis I was thinking, for simplicity, we would only define a meaning for the variants + versions that we actually want. Meaning basically that UUIDv6 I think stays as it is with the old variant, and UUIDv7 and 8 use the new 111 variant. Otherwise I think we have too much variation without any real benefit.

bradleypeabody avatar Aug 13 '21 19:08 bradleypeabody

Thanks for the clarification.

I don't think it makes sense to create UUIDv6 and UUIDv6ε. Instead of pleasing everyone we should have a clear decision on which one to have. This would simplify implementations by having one less UUID to implement and would help keeping the RFC less complex. Also UUIDv6ε would probably only apply to nanosecond precision.

nerg4l avatar Aug 13 '21 19:08 nerg4l

I looked more into this and found two things which should be taken into consideration.

I'm not sure if it is relevant in case of extending the RFC but ITU also has a UUID definition in X.667. Which states the following:

11.2 All UUIDs conforming to this Recommendation | International Standard shall have variant bits with bit 7 of octet 7 set to 1 and bit 6 of octet 7 set to 0. Bit 5 of octet 7 is the most significant bit of the Clock Sequence and shall be set in accordance with 12.4.

NOTE – Bit 5 is listed here as a variant bit because its value distinguishes historical formats. Strictly speaking, it is not part of the variant value for this Recommendation | International Standard, which uses only two bits for the variant.

I also checked how variant#0 (NCS) UUID looked like. It seems, it does not have a version bit. https://opensource.apple.com/source/CF/CF-299.35/Base.subproj/uuid.c.auto.html

 * Internal structure of variant #0 UUIDs
 *
 * The first 6 octets are the number of 4 usec units of time that have
 * passed since 1/1/80 0000 GMT.  The next 2 octets are reserved for
 * future use.  The next octet is an address family.  The next 7 octets
 * are a host ID in the form allowed by the specified address family.
 *
 * Note that while the family field (octet 8) was originally conceived
 * of as being able to hold values in the range [0..255], only [0..13]
 * were ever used.  Thus, the 2 MSB of this field are always 0 and are
 * used to distinguish old and current UUID forms.
 *
 * +--------------------------------------------------------------+
 * |                    high 32 bits of time                      |  0-3  .time_high
 * +-------------------------------+-------------------------------
 * |     low 16 bits of time       |  4-5               .time_low
 * +-------+-----------------------+
 * |         reserved              |  6-7               .reserved
 * +---------------+---------------+
 * |    family     |   8                                .family
 * +---------------+----------...-----+
 * |            node ID               |  9-16           .node
 * +--------------------------...-----+

Unfortunately, I could not find anything specification about Microsofts' variant#2.

A lot of people refer to UUDs defined by RFC 4122 as variant#1 version x UUIDs. RFC4122 also states the following about variant and version:

[...] The UUID format is 16 octets; some bits of the eight octet variant field specified below determine finer structure. [...]

[...] As such, it [variant] could more accurately be called a type field; we retain the original term for compatibility. [...]

[...] The version is more accurately a sub-type; again, we retain the term for compatibility. [...]

There for, I assume using variant#3 would allow to redefine the the structure entirely. Moving or removing version from the definition for example. Probably, current implementations of RFC4122 should be ignored because keeping BC looks impossible.

nerg4l avatar Aug 15 '21 13:08 nerg4l

I assume using variant#3 would allow to redefine the the structure entirely. Moving or removing version from the definition for example

It is the last unused variant, so there is no room for error.

edo1 avatar Aug 15 '21 14:08 edo1

I also think the version bits (subtype) are specific to the RFC4122 variant (type), which has many subtypes that must be separated from each other. Variant 3 (111) doesn't even have a structure yet.

This file appears to be the basis for Apple's implementation: https://github.com/BeyondTrust/pbis-open/blob/master/dcerpc/uuid/uuid.c

fabiolimace avatar Aug 15 '21 14:08 fabiolimace

Using the variant field to signal different bit semantics within RFC 4122 versions is not appropriate. The variant field is the overarching field that dictates layout and semantics of all other bits in a UUID. RFC4122 is very deliberately scoped to just variant == 0b10x. Hell, version isn't even defined outside of that specific variant.

I believe a more correct approach would be to use a different version to distinguish between timestamp encodings, much the way v3 and v5 distinguish between namespace hash algorithms.

And, yes, this would effectively double the number of new UUID versions being proposed, which is not ideal. This is one reason I'd like to see this proposal culled back to just 1 new timestamp version, per #30.

broofa avatar Aug 18 '21 20:08 broofa

Using the variant field to signal different bit semantics within RFC 4122 versions is not appropriate.

What about using variant=0b111 in a new format (without version)? Expanding the variable part by three bits reduces the probability of collisions by almost an order of magnitude.

edo1 avatar Aug 19 '21 02:08 edo1

What about using variant=0b111 in a new format (without version)? Expanding the variable part by three bits reduces the probability of collisions by almost an order of magnitude.

You mean the version part? I suppose you could do that. But imho, defining a new variant should be a Big Deal™. It should be motivated by the need for a whole new class of UUIDs, or by having exhausted the available version options, which we haven't done yet. E.g. if there was a need to move the version field to the end of the UUID (to improve db-locality?), or it needed to be 6 or 8-bits wide instead of 4.

I don't see such a need at this time.

broofa avatar Aug 19 '21 04:08 broofa

But imho, defining a new variant should be a Big Deal™

Agree. There were no sortable UUIDs in the standard. Is this a Big Deal™? Seriously though, I want the random part to be as large as possible.

edo1 avatar Aug 19 '21 05:08 edo1

I tried to lay it out here https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md as best I could. But the idea is if the variant field is set to 0b111, this would mean the version field fits in the bottom (least significant) bits of the 9th byte (so var and ver are in this one byte). This technically loses us only 1 bit - since variant was 3 bits, and version was 4 but now we using 8.

I agree that we should not try to make new variants of v1, v4, etc. or v6 for that matter (since its goal is to be easily adaptable from and as close to v1 as possible). But I think we can use it in v7 and v8 to simplify the bit layout so there's just one byte you have to worry about when determining version info for v7 and v8:

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | var |  ver    |                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

I'm hoping this can help move things toward greater simplicity in the spec/proposal.

bradleypeabody avatar Aug 19 '21 06:08 bradleypeabody

I'm hoping this can help move things toward greater simplicity in the spec/proposal.

Before After
RFC4122 is variant 0b10x RFC4122 is variant 0b10x or 0b111
2122 UUIDs per version 2122 or 2120 UUIDs per version
version in bits 48-51 version in bits 48-51 or bits 67-70

This is not "greater simplicity". And the rift this creates in how versioning works is going to be an ongoing pain in the ass to articulate and rationalize about.

"Look for versions 1-5 here if variant is 0b10x, and versions 6-31 there if variant is 0b111. What's that...? What if variant is 0b111, but the version field < 6? Uh... well... that's not really a thing. I mean, it's technically possible, but we didn't want to confuse people by having different versions with the same name."

Now that I think about it, that last part about 0b111 versions 1-5, is actually pretty awful. That those particular variant-version combinations are possible just seems ripe for confusion and abuse.

broofa avatar Aug 20 '21 07:08 broofa

In Draft 02 set definition of any UUID Version (UUIDv1/2/3/4/5/6/7/8) + Variant E (111) as a method for signaling an alternative bit layout to any previously defined UUID.

This is clearly overkill. Version 1 is the only version where there is any value in rearranging bit layout.

broofa avatar Aug 20 '21 08:08 broofa

the rift this creates in how versioning works is going to be an ongoing pain in the ass to articulate and rationalize about.

The logic I'm proposing in https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md would be simply:

For UUID7 bytes[9] == 0xE7, for UUID8 bytes[9] == 0xE8.

And for UUID versions <= 6 do what RFC4122 indicates.

That's it.

It is a difference, but I don't think it's overly complicated and confusing.

Now that I think about it, that last part about 0b111 versions 1-5, is actually pretty awful.

I agree, we should just disallow other uses of the 0b111 variant, since I don't think there's any real benefit to allowing it.

bradleypeabody avatar Aug 20 '21 15:08 bradleypeabody

The logic I'm proposing... ... I don't think it's overly complicated and confusing

I understand what you're proposing. And I agree that, had this been the bit layout RFC4122 used from the beginning, it would be simpler than what we have now. But it's not. It's a new layout in addition to what the original RFC specifies, so the complexity it introduces is in addition to the complexity that's already there.

I agree, we should just disallow other uses of the 0b111 variant, since I don't think there's any real benefit to allowing it.

I think you're missing the point. It's not about what exactly we say around this issue (although we'll have to make some decisions there), it's that we have to say anything at all. This change causes a certain amount of cognitive dissonance that can be avoided altogether if we just stick with the current scheme.

broofa avatar Aug 20 '21 16:08 broofa

This change causes a certain amount of cognitive dissonance

RFC4122 causes it's own cognitive dissonance. The fact that one of the backward compatibility issues that comes up is implementations which check the version bits without checking variant - that's a great example of just strange stuff that should, IMO, never have been there in the first place, and should be deprecated. Unnecessary complexity that people already get wrong because nobody wants to sit down and read the RFC4122 spec, because it's long and unnecessarily complicated. (No offense to the original authors, I'm just saying that today, with the factors at hand, we can do better.)

Some people will choose either just implement UUIDv7+8 or just leave the existing implementations of e.g. v1 and v4 (the most common existing implemented versions) - leave the old code as-is and just write new code for UUIDv7+8. This new code can be simpler and easier to understand. This is a benefit that needs to be measured and compared against the factor of making it different from prior versions.

RFC4122 has many problems. One of the goals here is to "fix" them by introducing a new, simpler design that people can just move forward with and leave the old stuff behind. Once we have new UUIDs, people won't be obligated to continue to have to deal with RFC4122 - if newer versions solve real-world problems, then great, new versions can be implemented and that's it, done deal. I'd much rather focus on making this new draft/spec as simple as possible (while not being unrealistic about backward compatibility issues), than forcing some old stuff from RFC4122 that we don't need.

I will find some examples of implementation differences and post here and hopefully this can help provide a convincing argument of the value of this. But the basic issue I have is I don't think it's simpler to leave things as they were in RFC4122. RFC4122 is a mess, we should move away from it. And if we can do so with only a few manageable backward compatibility concerns, I think it's a workable approach and better in the long run. Again I'll post some code soon to help demonstrate this factor of code complexity.

bradleypeabody avatar Aug 20 '21 19:08 bradleypeabody

If we merge variant and version fields into one byte and combine it with using a common time format, we get things like this:

https://play.golang.org/p/yWjgCNy_GQq

	var v [16]byte
	binary.BigEndian.PutUint64(v[:8], uint64(time.Now().UnixNano()))
	v[8] = 0xE7
	rand.Read(v[9:])

That is a correct and useful UUIDv7 implementation (per these notes, not the draft). It lacks a guarantee of monotonicity for values produced within the same clock tick (which IMO should be a recommendation not a requirement), but this can be added with just a few more lines of code.

Try finding an earlier UUID implementation that is anywhere near that simple to write and understand.

Simplicity in implementation and maintenance is a very real, tangible factor. It needs to be weighed carefully against the cost of changing things. And if we don't change old UUID version <= 5 values, I really don't see the problem.

Is there some specific real-world problem that this would cause that I'm missing? What specifically (which database program, library or software problem) would happen/break/be difficult/annoying/etc if we were to move forward with this? Maybe if I have an actual example of what you're worried about I could better think with with it.

bradleypeabody avatar Aug 21 '21 00:08 bradleypeabody

I'm still not sure if using half of the future variant is a good idea. I prefer to be conservative in this case. It may be easier to approve the changes.

But I like the simplicity it makes possible. Much of the discussion here arises from the need to work around version bits.

If you really plan to use half of the '111' variant, why would you want to use a version number? Version numbers belong to the '10x' variant. The '111' variant is an uninhabited land. You have the opportunity to create an entirely new layout. There is no need to be stuck with the '10x' variant design, which depends on the version number to differentiate between UUID subtypes.

I think it's better to just define the E variant (enhanced, extended?) and forget about the E7 and E8 versions. Or you can use ONE bit of the E variant as a flag to differentiate between E-UUIDs that have timestamps and those that don't.

I am concerned about reducing the UUID size from 122 to 120 bits. Losing 2 bits can result in a significant increase in collision probability. If you don't use the version number the amount of free bits for entropy increases.

fabiolimace avatar Aug 21 '21 02:08 fabiolimace

Is there some specific real-world problem that this would cause that I'm missing?

These sorts of questions are harder to answer:

  • "Is [some UUID] valid?"
  • "What version is [some UUID]?"
  • "How do I identify UUIDs in text?"
  • "I have a valid variant (0b111) and a valid version (3), so why is my UUID invalid?"

... and this sort of code becomes more complex:

  • https://github.com/uuidjs/uuid/blob/master/src/regex.js
  • https://github.com/uuidjs/uuid/blob/master/src/version.js

broofa avatar Aug 21 '21 02:08 broofa

Fair points. You're correct, it does add some more logic to these situations.

However, variable length will break the "find a UUID in text" anyway. So will Crockford Base32 encoding.

And extracting the version is an extra line or two of code to fix that code. (The version.js there btw is another example of broken code - it should be checking the variant bits.)

So I think it's a matter of comparing what happens when the points above are broken or made more difficult, vs the fact that all newer implementations (and some of which will only need to support e.g. UUIDv7 and v8) can be simpler.

UUIDs are supposed to be as opaque as possible. I would also wager than much of the code that is trying to extract version numbers and perform validation is probably doing something not terribly relevant to what most applications need anyway. Why are people checking the version? (can't you just use the opaque value) Why are they checking if a UUID is valid? (are you sure you can't just compare to all zeros to determine if there's a UUID here or not?)

bradleypeabody avatar Aug 22 '21 02:08 bradleypeabody

To follow up from earlier discussion and from #58, my current stance on this is that the simplicity combining the variant and version fields introduces is worth the downsides.

So far the down sides that have been brought are, along with by rebuttal:

  1. Added complexity/different from RFC4122

I understand the concern. The procedure for examining the version is explained in the new draft with two sentences:

extracting the version number can be done by examining the variant field at bits 64 and 65 for the values 1 and 0 respectively, and then extract the version from bits 48 through 51. UUID versions 7 and 8 can be identified by checking octet 9 for the values 0xE7 or 0xE8 respectively.

Yes, it is different. My opinion is that this does not present too much complexity. The first sentence is just reiterating what RFC4122 says much more verbosely, and the only thing being added is "UUID versions 7 and 8 can be identified by checking octet 9 for the values 0xE7 or 0xE8 respectively."

  1. It reserves two extra bits.

Concerns over the loss of bits being problematic are application specific, and the introduction of variable length UUIDs IMO addresses this concern. A fixed 128-bit value is much more problematic when it comes to concerns about collision probability or unguessability. So I think having those two bits reserved for future use to make one whole byte be devoted to the version is an acceptable tradeoff in the interest of simplicity, considering you can add plenty more bytes to your UUID to further reduce collision resistance if your application really needs it. No need to worry about 2 bits when you can add many more if you like.

If this ends up making it into an RFC, I suspect many new implementations will just implement UUID version 7 and/or 8 and not bother with the rest. IMO, making these implementations simpler should be a priority.

bradleypeabody avatar Feb 15 '22 06:02 bradleypeabody

I don't see a real reason to change the variant bits to develop a time-ordered UUID format. Implementing a UUIDv7 generator is an easy job that can be done by just 100 lines of code in many languages, even with the old, weird version/variant layout. The reorganized layout might reduce some lines of code, but I don't think that's worth sacrificing the future extendability of the UUID standard. It's possible in the future that another new UUID format really really needs to move the version bits, and then if no variant is left, the UUID standard will die.

That said, if the last variant should be consumed now, I think the new format should use a different name than version 7. Variant 10x Version 7 may be defined in the future, and such definition should be named as "UUIDv7" to keep consistency with UUIDv1-5. Therefore, Variant 111 Version 7 should be named differently, or Variant 111 series should be started from version 1 with a different naming convention.

LiosK avatar Feb 16 '22 12:02 LiosK

@LiosK, with the placement of the variant+version in the same octet we actually extend variant 111 to be used by a future implementation if they desire. I detail this a bit more in the Draft 03 file found PR #58 if you want to take a look at the proposed text.

Long story short: We set the 3 variant bits to 111 and dictate the next following bit is always a 0. Thus 1110 = E. This is followed by the four bit version in our new variant; but any future spec may specify that if they want to use 111 the next bit should be set to 1 making 1111 (F), and ultimately a new variant is born for whomever to do what they want. I wanted to ensure we did allow for future extensibility of the UUID spec even though there have been no new additions in ~16 years.

As for setting 1110 and starting with version 7 instead of starting over the version counting: This was the conversation between myself and Brad on the topic back in August of 2021:

Kyzer: There is no reason we need to start at version 7 since our bit space is all to ourselves now with this variant. Basically variant 111 + version 1 and version 2 don’t conflict with RFC4122s version 1 and 2.

Brad: I agree with this in principle, but it creates a new problem of explaining to people what the numbering system is. Just calling it "version 7" and saying "in version 7, byte 8 is set to 0xE7" is really simple to understand and follow. I'm open to a proposal of a different numbering system for this 0b111 variant, but I'm not sold enough on the benefits to originate it myself.

I could go either way after all UUIDv1ε vs UUIDv1 was my thought originally on how to distinguish Variant 1110/E + Version 1 vs RFC 4122 variant 10xx/89AB + Version 1

kyzer-davis avatar Feb 16 '22 23:02 kyzer-davis

Makes sense. My concern is addressed. I am yet to be convinced that the variant bit change is necessary because the simplicity that will be achieved sounds trivial to me, but let's see how others think. Thank you for your clear explanation.

LiosK avatar Feb 16 '22 23:02 LiosK

How about defining parallel versions in two variants: v7, v8 and E7, E8? Is it an overkill?

In v7 and v8, the version bits are kept in the same position as in the 10xx variant. These versions can be used by those who are conservative.

In E7 and E8, the version bits are placed side by side with the new 1110 variant bits.

The 48-bit timestamp fits both v7 and E7.

E7 and E8 can be expanded up to 64 bytes.

+--------+--------+------------------+
|     VERSION     |                  |
+--------+--------|   DESCRIPTION    |
|  10xx  |  1110  |                  |
+--------+--------+------------------+
|   v1   |   --   |  Time-based      |
|   v2   |   --   |  DCE-security    |
|   v3   |   --   |  Name-based MD5  |
|   v4   |   --   |  Random-based    |
|   v5   |   --   |  Name-based SHA1 |
|   v6   |   --   |  Time-ordered    |
|   v7   |   E7   |  K-sorted        |
|   v8   |   E8   |  Custom          |
+--------+--------+------------------+


v7:   |....time....|M...N...............|

E7:   |....time....|....NM..............|  (...)  ..............................|

v8:   |.............M...N...............|

E8:   |.................NM..............|  (...)  ..............................|

This comment is similar to @kyzer-davis original proposal for Draft 2 :)

EDIT: added "E7 and E8 can be expanded up to 64 bytes".

fabiolimace avatar Feb 17 '22 01:02 fabiolimace

any future spec may specify that if they want to use 111 the next bit should be set to 1 making 1111 (F)

@kyzer-davis this was the missing piece for me. Future-standards have to have a way of distinguishing themselves from existing standards.

Re: How about defining parallel versions in two variants

This is a classic "worst of two worlds" solution, IMHO. Pick one or the other... but let's please not be wishy-washy about how implementors indicate and detect versions. It will just lead to yet more confusion.

@bradleypeabody You've cited and rebutted the arguments against this proposal, but I have yet to hear a compelling argument in favor of it. Is there a benefit here beyond the aesthetics of how fields are laid out? While I understand the appeal, that's not solving any actual problems we have. It's just "nice", but that's not a sufficient argument.

The problem with establishing a new variant now, for no reason other than aesthetics, is that we are not in a good position to anticipate the needs of future spec authors (assuming there are any). They (our future selves?) may be able to put the 0b111 variant to better use, so why not leave that option open to them?

broofa avatar Feb 17 '22 02:02 broofa

This is a classic "worst of two worlds" solution, IMHO. Pick one or the other... but let's please not be wishy-washy about how implementors indicate and detect versions. It will just lead to yet more confusion.

I wholeheartedly agree with this.

have yet to hear a compelling argument in favor of it. Is there a benefit here beyond the aesthetics of how fields are laid out?

I just want to throw this out there: I think I look at this entire subject a bit differently, and it might be part of the differences we have on this. From my perspective, RFC4122 has some things that are good but it also has a lot of things that are, with the benefit of hindsight, unnecessary. When we consider what aspects to keep and which to change in this new spec, I tend to hear arguments about not changing something from what it is in RFC4122. However, when I look at existing attributes of RFC4122 I ask "do we actually need to keep this?" and "is this good? do we really want this?" Some things we can't get rid of because it will break a lot of existing code. I think we all agree that we can't move the variant field because it will explode a bunch of existing implementations. Fine, that makes sense. But as we get into the other fields when we talk about compelling reasons or justifications for things, I don't see a strong justification for keeping old things they way they are just because someone wrote it in an earlier spec. If it breaks existing implementations, that's something to consider. But hauling around unnecessary complexity from RFC4122 because we don't want to change things too much - I just fundamentally don't think about it like that.

One of the main goals here is to make something useful so databases and other code that needs to make unique identifiers can easily and effectively do so. And I think each of these questions should be measured against that.

So to me this issue of moving the version number is more about answering "can we just make this simpler so new implementations can get rid of the old baggage?" I think once UUIDv6,7,8 are out and specified, a lot of new implementations won't bother implementing the earlier versions (you will notice that at least some existing UUID implementations do not implement all 5 UUID versions, they implement the ones the author understood and deemed useful). Maybe I'm wrong and maybe that comes across as hubris (it's not meant that way), but I really am trying to make it so when people reach for a spec to "make me an ID" they find something simple and easy, not RFC4122.

bradleypeabody avatar Feb 18 '22 06:02 bradleypeabody

One of the main goals here is to make something useful so databases

I think UUID v6 and v7 can also be useful for event-driven applications.

This specification suggests using UUID v6 for event IDs: HTTP Feeds.

fabiolimace avatar Feb 18 '22 08:02 fabiolimace

@bradleypeabody: I recognize the value of a first-principles take on this, I just don't believe it's warranted at this time. If we were creating a radically new standard that deprecated 4122 then, yes, a new variant makes sense. I believe that's what the original RFC authors did. They took the OSF DCE spec for UUIDs, incorporated it as "version 2", and then promptly went on to define a superset that obviated the need to care about what OSF DCE uuids were.

We're not doing that. Or, at least, that's not the sense I get. For example, we're not proposing a replacement for version 4 or 5 (or 3). So the existing RFC will remain important and relevant for some time to come.

I don't see a strong justification for keeping old things they way they are just because someone wrote it in an earlier spec.

This presumes the onus is on I and others to justify we this change should not happen. I disagree with that presumption. The onus is on you to justify why it is needed. Hence, my question. In case it's not clear, my "bar" for justifying a new variant is simple: Create a new variant when the current variant fails to meet existing needs.

So what exactly about the new versions being proposed demand a new variant? The only thing I've seen might be variable length UUIDs. But that idea is not well-fleshed out, nor is it essential.

can we just make this simpler so new implementations can get rid of the old baggage?

Nope. Not gonna happen. Whatever warts 4122 version 1 ids may have, version 4 is killing it. It's 80+% of use cases (probably more like 90% if we're being honest). The vast majority of people using UUIDs won't easily be convinced to migrate to a new standard anytime soon.

broofa avatar Feb 18 '22 18:02 broofa