uuid6-ietf-draft
uuid6-ietf-draft copied to clipboard
Discussion: Monotonicity and Counters
All things Monotonicity and Counters to assist with sort ordering, db index locality, and same Timestamp-tick collision avoidance during batch UUID creation.
Sub-Topics: Counter position, length, rollover handling and seeding.
Extension of the thread: https://github.com/uuid6/uuid6-ietf-draft/pull/58#discussion_r810593208
Required Reading: https://github.com/uuid6/uuid6-ietf-draft/issues/41#issuecomment-907517063 and Research efforts
Please include this in version 7:
A counter MAY be placed instead of the left side of the random section, immediately after the timestamp. The counter is designed to keep UUIDs monotonic within a timestamp step, by default within a millisecond. The default counter length is 16 bits for the millisecond timestamp. The counter MUST be reset every timestamp step. The counter MUST be frozen just before overflowing up to the next timestep step. For each UUID instance, the random section MUST be updated despite the use of the counter.
A counter MAY be placed instead of...
Is there any benefit to this over simply treating the entire rand
field as a counter, the way ULID does? In which case, the counter is effectively at the very end ("far right") side of the random field.
The advantage of ULID approach is that counter rollover so statistically unlikely as to not warrant mention. Whereas placement at the left side of the field requires reasoning about creation rates, counter sizes, and rollover error cases.
Is there any benefit to this over simply treating the entire
rand
field as a counter, the way ULID does?
- The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.
- If random part is close to 111...111, then you have too short counter
- The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.
Isn't this true of any monotonic counter scheme? Unless you randomize all bits with every uuid, there will be some degree of predictability.
- If random part is close to 111...111, then you have too short counter
Again, true of any counter scheme.
Treating the whole 70-bit random value as a counter means the odds of actually encountering a rollover case are very small. Even if the lowest 20 bits turn over (1M uuids / msec), the top-most 50 bits would have to all be ones. There's only a ~1 in 1015 chance of that occurring. For the truly cautious, simply checking for that case and generating a different random value, or just forcing any of those high-order bits to be zero, eliminates that possibility.
- The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.
Isn't this true of any monotonic counter scheme? Unless you randomize all bits with every uuid, there will be some degree of predictability.
l beleive that all random bits of every uuid MUST be randomized despite using counter. Such monotonic counter scheme is secure.
- If random part is close to 111...111, then you have too short counter
Again, true of any counter scheme.
No, if we have 16 empty bits of separate counter for every millisecond.
Treating the whole 70-bit random value as a counter means the odds of actually encountering a rollover case are very small. Even if the lowest 20 bits turn over (1M uuids / msec), the top-most 50 bits would have to all be ones. There's only a ~1 in 1015 chance of that occurring. For the truly cautious, simply checking for that case and generating a different random value, or just forcing any of those high-order bits to be zero, eliminates that possibility.
You are correct.
I agree that it should generally be encouraged to entirely randomize some (say, 16 or 32 bits) trailing bits at every generation to ensure some levels of unpredictability, rather than use the whole rand
field as a monotonic random counter. I think the draft RFC should state this explicitly to guide implementers, but the thing is, it is very difficult to determine the appropriate lengths of the monotonic random and per-ID random. So, perhaps it might be best for the draft to just state this point and have implementers decide appropriate sizes.
oklog/ulid.Monotonic provides monotonicity parameterized by a user-provided inc
value. Higher inc
values provide less predictable output at the cost of fewer overall possible outputs (i.e. higher probability of collision).
IMO — specs should not take any stance on any property of random portions of IDs. Specs should define them as random, and if implementations want to provide monotonicity within some bounds, that's above-and-beyond spec guarantees.
@peterbourgon If something is not in the specification, then this is a 99.99% guarantee that this will not be in the implementation either. Therefore, the specification should contain at least a more or less satisfactory solution as an option.
I agree that it should generally be encouraged to entirely randomize some (say, 16 or 32 bits) trailing bits at every generation to ensure some levels of unpredictability, rather than use the whole
rand
field as a monotonic random counter. I think the draft RFC should state this explicitly to guide implementers, but the thing is, it is very difficult to determine the appropriate lengths of the monotonic random and per-ID random. So, perhaps it might be best for the draft to just state this point and have implementers decide appropriate sizes.
@LiosK I like your solution very much. I would add freezing just before overflowing the counter, although this is unlikely.
If something is not in the specification, then this is a 99.99% guarantee that this will not be in the implementation either.
I don't think I agree, but I also don't think this is a bad thing. In fact, I think it's correct! The spec defines the random portion of the ID to be random. If those bytes happen to be monotonic, that's a detail of the implementation.
@peterbourgon @sergeyprokhorenko I just merged Draft 03 into master. Could you take a look at the latest Monotonicity and Counters section? I believe it already hits on both of your points.
@kyzer-davis
-
Waiting for the timestamp to advance can slow down the generation of UUIDs and thus break the application. The UUID generation should not be a source of possible run-time errors for the application. It is better to allow a slight non-monotonicity, which the DBMS can handle quite well. It will only slow down the search for some records a little. In any case, with several data sources, a slight non-monotonicity is inevitable.
-
I suggest to add the left part of to the text: "With this method the left part of random data is extended to also double as a counter" to avoid of guessability of UUIDs. The right part of random data will not be frozen till the next timestamp tick.
UUID generation is necessarily a possible source of runtime errors, because entropy is exhaustible.
UUID generation is necessarily a possible source of runtime errors, because entropy is exhaustible.
@peterbourgon This is not true. It is entirely possible to prevent errors during the generation of UUIDs. It is better to prevent errors if possible.
If you mean collisions of UUIDs, then such errors do not occur during generation, but when writing to the table.
If you run out of entropy, as far as I'm aware your options are (a) fail, or (b) block until entropy is replenished.
@sergeyprokhorenko,
I suggest to add the left part of to the text: "With this method the left part of random data is extended to also double as a counter" to avoid of guessability of UUIDs. The right part of random data will not be frozen till the next timestamp tick.
I assume you are referencing rand_a
, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"
I simply say you SHOULD use a monotonic random and for fixed-length counters you SHOULD use UUIDv8 because of the heated debates around counter length, rollover handling, and initialization of the counter have so many variations based on user input that it is better fit for UUIDv8.
If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a
MAY double as a counter when utilizing this method.
Waiting for the timestamp to advance can slow down the generation of UUIDs and thus break the application. The UUID generation should not be a source of possible run-time errors for the application. It is better to allow a slight non-monotonicity, which the DBMS can handle quite well. It will only slow down the search for some records a little. In any case, with several data sources, a slight non-monotonicity is inevitable.
The text let's the application implementers decide and provides some guidance. Both usages in this section are SHOULD not MUST. Implementations are fit to do what they want. If they desire to let some items go out of order then they may.
If you run out of entropy, as far as I'm aware your options are (a) fail, or (b) block until entropy is replenished.
@peterbourgon It sounds like "run out of numbers" :))
I assume you are referencing
rand_a
, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"
@kyzer-davis No, I mean https://github.com/uuid6/uuid6-ietf-draft/issues/60#issuecomment-1053554344 I like it very much
If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7
rand_a
MAY double as a counter when utilizing this method.
@kyzer-davis Yes please. That would be great!
@sergeyprokhorenko Running out of entropy is much more common than running out of numbers :)
System administrators, especially those supervising Internet servers, have to ensure that the server processes will not halt because of entropy depletion.
https://en.m.wikipedia.org/wiki/Entropy_(computing)
The text let's the application implementers decide and provides some guidance. Both usages in this section are SHOULD not MUST. Implementations are fit to do what they want. If they desire to let some items go out of order then they may.
@kyzer-davis It seems to me that the specification should suggest to developers a safer option, and not an unexpected pause in the generation of UUIDs when the counter overflows. Overflows always remind me of the Ariane flight V88 disaster
@sergeyprokhorenko Running out of entropy is much more common than running out of numbers :)
System administrators, especially those supervising Internet servers, have to ensure that the server processes will not halt because of entropy depletion.
https://en.m.wikipedia.org/wiki/Entropy_(computing)
This is about the low quality of pseudo-random numbers, but not about their deficit. Poor quality generators generate a repeating sequence of pseudo-random numbers, which leads to collisions. The problem of low quality of pseudo-random numbers is already solved in the spec: https://uuid6.github.io/uuid6-ietf-draft/#name-unguessability
@sergeyprokhorenko
If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7
rand_a
MAY double as a counter when utilizing this method.@kyzer-davis Yes please. That would be great!
Let me put together a Change Proposal template for this and get it in Draft 03 Phase 3 PR.
I assume you are referencing
rand_a
, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"@kyzer-davis No, I mean #60 (comment) I like it very much
I apologize, somehow I missed this comment entirely. It does seem like a valid approach but seems pretty similar to Fixed-Length Dedicated Counter Bits (Method 1)
at the moment. The difference is in semantics. My text states to dedicated a portion of bits after the timestamp; this text states to dedicate a portion of the most significant, left-most, part of the random. 9/10 times they will likely be the same thing.
I could update Fixed-Length Dedicated Counter Seeding:
to describe this practice. I took a quick swing below:
Fixed-Length Dedicated Counter Seeding:
- Implementations utilizing either fixed-length counter method MAY randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented by one.
- When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting storability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments.
- A randomly seeded counter alongside Method 2 is more-or-less the same as Monotonic Random (Method 3).
- Implementations utilizing either fixed-length counter method MAY also choose to initialize the counter at zero to ensure the full bit-space is utilized and help avoid counter rollovers. This approach has less entropy and more guessibility but ensures the most of the counter bit space.
@sergeyprokhorenko ...did you even check #76? With this update both Method 1 and Method 3 are recommended counter solutions for v7E.
I deleted my comment
I deleted my comment
As did I :)
Let me know what you think about that proposed "Fixed-Length Dedicated Counter Seeding:" text. If it looks good I can write up a change proposal template and possibly sneak it in the open PR.
@sergeyprokhorenko
This is about the low quality of pseudo-random numbers, but not about their deficit.
I believe you're mistaken. Both are issues, but I'm pointing out the fact that entropy — crypto-grade and pseudorandom both — is a resource which can be depleted. If something consumes that resource, then it necessarily must deal with the possibility that there is no entropy available in a given period of time.
@peterbourgon You probably mean the insufficient speed of generating pseudo-random numbers and UUIDs. I think that multi-threaded generation and buffering of pseudo-random numbers could solve this problem.
No, I don't mean that either :) I mean literally that entropy is a stream of bytes which can pause or stop altogether for unbounded periods of time. Please read the linked article.
I feel the current content of Monotonicity and Counters section to be a little bit confusing. It's basically very permissive, so personally I feel okay with this because almost everything is permitted, but some implementers might get lost.
I think the decisions left up to implementers by this section could be summarized to the following two points, if the spec clearly required that the counter field be placed right after the timestamp and the remaining bits be filled with random:
- Zero-starting counter or randomly seeded counter
- Length of counter bits from 0 bits to 72 bits
Under this framework, the three methods can be described as:
- Method 1: 12-24-bit zero-starting or randomly seeded counter
- Method 2: 72-bit randomly seeded or partially (i.e. significant bits frozen for a given timestamp tick) randomly seeded counter
- Method 3: 72-bit randomly seeded counter
- Single-node without batch: 0-bit counter
By https://github.com/uuid6/uuid6-ietf-draft/issues/60#issuecomment-1053554344, I meant the spec should clearly state the counter should generally be limited to 40 or 56 bits. Although unguessability is definitely not the priority, I just don't want to see implementers carelessly leave a potential attack surface.
Then, how about omitting or amalgamating the Method 1, 2, 3 description and reorganizing the related sections like the following?
4.3. UUID Version 7E
Current content + rand_a
and rand_b
together MUST be dedicated to 0-72-bit counter
and remaining random
, in this order.
5.2. Monotonicity and Counters
AS IS
Fixed-Length Dedicated Counter Length
Current content + UUIDv7 SHOULD utilize 0-40-bit counter.
Fixed-Length Dedicated Counter Seeding:
Current content + UUIDv7 SHOULD utilize randomly seeded counter; UUIDv8 may utilize zero-starting counter.
Fixed-Length Dedicated Counter Rollover Handling
AS IS
Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:
AS IS
Current content + UUIDv7 SHOULD utilize randomly seeded counter; UUIDv8 may utilize zero-starting counter.
@LiosK I am against discrimination in relation to (16 bit) zero-starting counter.