uuid6-ietf-draft Discussion: Unix Timestamp Granularity in UUIDv7

Question: Too many or too much?

Current Draft-02 implementation:

36 bits to avoid the year 2038 problem.
If 32 bit timestamp, pad left/start of 32 bit to create 36 bit value in the UUIDv7
if 64 bit timestamp, truncate to last, right-most 36 bits for use in UUIDv7
36 bits provides timestamp values up to the year 4147

Alternatives Proposed:

34 bit timestamp, giving 2 bits back to entropy/node/randomness, valid up to the year 2514 source

Aug 10 '21 16:08 kyzer-davis

I vote to leave at 36 in the current implementation, I would rather future proof as much as possible with a larger number than reduce the size and truncate a 64-bit timestamp more.

That being said: Depending on how we choose to encode subsec_a based on #24 we may end up with two more additional bits that are always padded. Might as well give them back to the unixts to go to 38 if that is the case.

Aug 10 '21 17:08 kyzer-davis

Yes, the 36-bit length was definitely done intentionally in order to avoid running out of numbers - but I do agree it's annoying to use.

Another option would be to use a 32-bit unsigned integer, which runs out in the year 2106. I'm not super excited about an end date that is close enough for my grandchildren to curse me for when it runs out on them, but it's an option and would definitely be simpler to use.

I'll write more on this on #24 in a moment.

Aug 12 '21 05:08 bradleypeabody

Bits Signedness End year

32 Signed 2038

32 Unsigned 2106

33 Signed 2106

33 Unsigned 2242

34 Signed 2242

34 Unsigned 2514

35 Signed 2514

35 Unsigned 3058

36 Signed 3058

36 Unsigned 4147

Bits	Signedness	End year
32	Signed	2038
32	Unsigned	2106
33	Signed	2106
33	Unsigned	2242
34	Signed	2242
34	Unsigned	2514
35	Signed	2514
35	Unsigned	3058
36	Signed	3058
36	Unsigned	4147

I think 32 bits even if unsigned are not enough. 2106 is not that far in the future.

I can also see the benefit of keeping the value unsigned because generating a UUID before 1970-01-01 shouldn't be a real life use case.

I definitely want to avoid introducing a new epoch. The Unix epoch makes implementations much easier. I don't even know why did they decide to use the Gregorian epoch in case of UUIDv1.

For me anything from 2514 looks good.

Aug 13 '21 19:08 nerg4l

@nerg4l @kyzer-davis If we did the 111 var field from #26 for UUID7, that gives us the first 64 bits clear and free of any other requirements. What if we, despite how clever I thought I was being with this subsecond encoding stuff (I still like the idea but I don't like how much confusion it has created), instead just use a 64-bit nanosecond-precise unix timestamp. I'm not sure if this shoudl replace UUIDv7 or maybe be a separate version (we could do UUID9 ;) ). But regardless, I'm just trying to see if there is a way to simplify this whole thing down so when people read the spec it's like "oh of course, that makes sense" instead of "what the..."

So the new bit layout would be:

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    unix_nano_timestamp_u64                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    unix_nano_timestamp_u64                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | var |  ver    |    seq         |        rand                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                            rand                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Where unix_nano_timestamp_u64 is an unsigned 64-bit unix-like timestamp (but instead of seconds since the epoch it's nanoseconds since the epoch), seq is 8 bits for sequence and rand is random. Max time with this is 2554-07-21 23:34:33 UTC

And then we basically just say basically "implementations can parse out the time if the they like, but the other fields are essentially write-only and are a recommendation - don't try to read these because it's not invalid for implementations make changes after the ver-var field. And factually if implementations wanted to encode random stuff into the least significant bytes of the nanosecond-precise timeestamp - I don't think that should even be explicitly forbidden - it's just like "if you do this, other implementations may get the wrong timestamp - use at your own risk".

I think this would be dramatically simpler and easier to understand.

Aug 13 '21 20:08 bradleypeabody

@bradleypeabody I think the variable subsecond handling in UUIDv7 was added because not all languages can return unix timestamp with nanosecond precision eg. JavaScript, PHP.

Changing the position of the version bit would affect some wildly used libraries. Here are some examples:

Java affected: https://github.com/openjdk/jdk8/blob/6a383433a9f4661a96a90b2a4c7b5b9a85720031/jdk/src/share/classes/java/util/UUID.java#L245-L248
Google UUID module for Go affected: https://github.com/google/uuid/blob/44b5fee7c49cf3bcdf723f106b36d56ef13ccc88/uuid.go#L227-L229
Python not affected: https://github.com/python/cpython/blob/main/Lib/uuid.py#L354-L357
ramsey UUID package for PHP not affected because parsing fails: https://github.com/ramsey/uuid/blob/fe665a03df4f056aa65af552a96e1976df8c8dae/src/Rfc4122/Fields.php#L186-L193
uuidjs UUID package for JavaScript affected (validate checks for the correctness of the string): https://github.com/uuidjs/uuid/blob/91805f665c38b691ac2cbda56a99231432b00a1a/src/version.js#L3-L9

Edit: It is always quite unfortunate when there are already existing implementations.

Aug 14 '21 06:08 nerg4l

Another important thing to keep in mind is how many UUIDs can be generated per unit of time. The more of a type of UUID that can be generated in a given interval, the better.

I did some simple math to get the maximum number of UUIDs version 1 that can be generated in a nanosecond interval. I treated all non-timestamp bits as random bits and obtained the value ~~461168601842738816 (2**62/10)~~. So I used this number as a reference for comparison with the other UUID types. A fraction called "proportion" is used to express the comparison numerically.

~~These are the proportions:~~

UUIDv1: 1.00 (reference)
COMBv4: 0.04
UUIDv7-34b-sec: 0.67
UUIDv7-36b-sec: 0.17
UUIDv7-44b-msec: 0.66
ULID: 2.62 (cheats using 6 extra bits)

~~I was about to propose an alternative UUIDv7 with 44 bits for timestamp and millisecond precision. But when I did that calculation, I saw how difficult it is to find another layout that beats UUIDv1.~~

~~This are the maths:~~

~~UUIDv1~~ ~~Time unit: 100-ns~~ ~~Time bits: 60~~ ~~Time limit: 5623 AD (260 / 107 / 60 / 60 / 24 / 365.25 + 1970)~~ ~~Remaining bits: 62 (128-60-4-2)~~ ~~UUIDs per nanosecond: 461168601842738816 (262 / 10)~~ ~~Proportion: 1.00 (used as reference)~~ ~~COMBv4 (with version and variant bits):~~ ~~Time unit: ms~~ ~~Time bits: 48~~ ~~Time limit: 10889 AD (248 / 1000 / 60 / 60 / 24 / 365.25 + 1970)~~ ~~Remaining bits: 74 (128 - 48 - 4 - 2)~~ ~~UUIDs per nanosecond: 18889465931478580 (274 / 106)~~ ~~Proportion: 0.04 (18889465931478580 / 461168601842738816)~~ ~~UUIDv7-34b-sec (34 bits for seconds)~~ ~~Time unit: s~~ ~~Time bits: 34~~ ~~Time limit: 2514 AD (234 / 60 / 60 / 24 / 365.25 + 1970)~~ ~~Remaining bits: 88 (128 - 34 - 4 - 2)~~ ~~UUIDs per nanosecond: 309485009821345088 (288 / 109)~~ ~~Proportion: 0.67 (309485009821345088 / 461168601842738816)~~ ~~UUIDv7-36b-sec (36 bits for seconds)~~ ~~Time unit: s~~ ~~Time bits: 36~~ ~~Time limit: 4147 AD (236 / 60 / 60 / 24 / 365.25 + 1970)~~ ~~Remaining bits: 86 (128 - 36 - 4 - 2)~~ ~~UUIDs per nanosecond: 77371252455336272 (286 / 109)~~ ~~Proportion: 0.17 (77371252455336272 / 461168601842738816)~~ ~~UUIDv7-44b-msec (44 bits for MILLIS)~~ ~~Time unit: ms~~ ~~Time bits: 44~~ ~~Time limit: 2527 AD (244 / 1000 / 60 / 60 / 24 / 365.25 + 1970)~~ ~~Remaining bits: 78 (128 - 44 - 4 - 2)~~ ~~UUIDs per nanosecond: 302231454903657280 (278 / 106)~~ ~~Proportion: 0.66 (302231454903657280 / 461168601842738816)~~ ~~ULID (no version or variant bits):~~ ~~Time unit: ms~~ ~~Time bits: 48~~ ~~Time limit: 10889 AD (248 / 1000 / 60 / 60 / 24 / 365.25 + 1970)~~ ~~Remaining bits: 80 (128 - 48)~~ ~~UUIDs per nanosecond: 1208925819614629120 (280 / 106)~~ ~~Proportion: 2.62 (1208925819614629120 / 461168601842738816)~~

Aug 14 '21 08:08 fabiolimace

	UUIDs per nanosecond: 461168601842738816 (2**62 / 10)

Wouldn't it be

	UUIDs per nanosecond: 46116860184273879 (2**62 / 100)

?

BTW, your calculator is wrong a bit. https://play.golang.org/p/B-rzKVAJhNE

Aug 14 '21 17:08 edo1

@edo1

Yes. I made a silly mistake in the beginning. All the rest of the calculation is wrong! I will fix this.

Thanks for noticing!

Aug 14 '21 17:08 fabiolimace

Now any UUIDv7 that we have talked about is better than UUIDv1. I use Python's Decimal() here to avoid division errors.

UUIDv1: 1.00 (reference)
COMBv4: 0.41
UUIDv7-34b-sec: 6.71
UUIDv7-36b-sec: 1.68
UUIDv7-44b-msec: 6.55
ULID: 26.21 (cheats using 6 extra bits)
UUIDv7-40b-10msec: 10.49
UUIDv7-48b-100msec: 4.10

Note: in this simple (naive?) calculation, all bits that are not timestamps are considered random.

UUIDv1
	Time unit: 100-ns
	Time bits: 60
	Time limit: 5623 AD (2**60 / 10**7 / 60 / 60 / 24 / 365.25 + 1970)
	Remaining bits: 62 (128-60-4-2)
	UUIDs per nanosecond: 46116860184273879.04 (Decimal(2**62) / Decimal(100))
	Proportion: 1.00 (used as reference)

COMBv4 (with version and variant bits):
	Time unit: ms
	Time bits: 48
	Time limit: 10889 AD (2**48 / 1000 / 60 / 60 / 24 / 365.25 + 1970)
	Remaining bits: 74 (128 - 48 - 4 - 2)
	UUIDs per nanosecond: 18889465931478580.854784 (Decimal(2**74) / Decimal(10**6))
	Proportion: 0.41 (Decimal(18889465931478580.854784) / Decimal(46116860184273879.04))
	
UUIDv7-34b-sec (34 bits for seconds)
	Time unit: s
	Time bits: 34
	Time limit: 2514 AD (2**34 / 60 / 60 / 24 / 365.25 + 1970)
	Remaining bits: 88 (128 - 34 - 4 - 2)
	UUIDs per nanosecond: 309485009821345068.724781056 (Decimal(2**88) / Decimal(10**9))
	Proportion: 6.71 (Decimal(309485009821345068.724781056) / Decimal(46116860184273879.04))
	
UUIDv7-36b-sec (36 bits for seconds)
	Time unit: s
	Time bits: 36
	Time limit: 4147 AD (2**36 / 60 / 60 / 24 / 365.25 + 1970)
	Remaining bits: 86 (128 - 36 - 4 - 2)
	UUIDs per nanosecond: 77371252455336267.181195264 (Decimal(2**86) / Decimal(10**9))
	Proportion: 1.68 (Decimal(77371252455336267.181195264) / Decimal(46116860184273879.04))
	
UUIDv7-44b-msec (44 bits for MILLIS)
	Time unit: ms
	Time bits: 44
	Time limit: 2527 AD (2**44 / 1000 / 60 / 60 / 24 / 365.25 + 1970)
	Remaining bits: 78 (128 - 44 - 4 - 2)
	UUIDs per nanosecond: 302231454903657293.676544 (Decimal(2**78) / Decimal(10**6))
	Proportion: 6.55 (Decimal(302231454903657293.676544) / Decimal(46116860184273879.04))
	
ULID (no version or variant bits):
	Time unit: ms
	Time bits: 48
	Time limit: 10889 AD (2**48 / 1000 / 60 / 60 / 24 / 365.25 + 1970)
	Remaining bits: 80 (128 - 48)
	UUIDs per nanosecond: 1208925819614629174.706176 (Decimal(2**80) / Decimal(10**6))
	Proportion: 26.21 (Decimal(1208925819614629174.706176) / Decimal(46116860184273879.04))
	
UUIDv7-40b-10msec (40 bits for 10-MILLIS)
	Time unit: 10-ms
	Time bits: 40
	Time limit: 2318 AD (2**40 / 10**2 / 60 / 60 / 24 / 365.25 + 1970)
	Remaining bits: 82 (128 - 40 - 4 - 2)
	UUIDs per nanosecond: 483570327845851669.8824704 (Decimal(2**82) / Decimal(10**7))
	Proportion: 10.49 (Decimal(483570327845851669.8824704) / Decimal(46116860184273879.04))
	
UUIDv7-48b-100usec (48 bits for 100-MICROS)
	Time unit: 100-us
	Time bits: 48
	Time limit: 2861 AD (2**48 / 10**4 / 60 / 60 / 24 / 365.25 + 1970)
	Remaining bits: 74 (128 - 48 - 4 - 2)
	UUIDs per nanosecond: 188894659314785808.54784 (Decimal(2**74) / Decimal(10**5))
	Proportion: 4.10 (Decimal(188894659314785808.54784) / Decimal(46116860184273879.04))

Using the General Birthday Formula

The General Birthday Formula applied to the UUID types:

UUID TYPE                    UUIDS PER NANOSECONDS  GENERAL BIRTHDAY FORMULA
UUIDv1             Decimal(2**62) / Decimal(10**2)        252846877.03432938
COMBv4             Decimal(2**74) / Decimal(10**6)        161822001.30197080
UUIDv7-34b-sec     Decimal(2**88) / Decimal(10**9)        655009407.54042970
UUIDv7-36b-sec     Decimal(2**86) / Decimal(10**9)        327504703.77021486
UUIDv7-44b-msec    Decimal(2**78) / Decimal(10**6)        647288005.20788320
ULID               Decimal(2**80) / Decimal(10**6)       1294576010.41576650
UUIDv7-40b-10msec  Decimal(2**82) / Decimal(10**7)        818761759.42553700
UUIDv7-48b-100usec Decimal(2**74) / Decimal(10**5)        511726099.64096063

The General Birthday Formula:

import math
from decimal import Decimal

def birthday_formula(t):
     return Decimal(math.sqrt(-2 * math.log(1-1/2)) * math.sqrt(t))
     
# Example on terminal
>>> uuid1 = Decimal(2**62) / Decimal(10**2)
>>> birthday_formula(uuid1)
Decimal('252846877.0343293845653533935546875')

The formula calculates the number of UUIDs which need to be generated in order to have a 50% probability of at least one collision.

Sources:

https://betterexplained.com/articles/understanding-the-birthday-paradox
https://en.wikipedia.org/wiki/Universally_unique_identifier

Update (2021-08-15)

Added the UUIDv7 with 40 bits for 10-ms precision
Added the UUIDv7 with 48 bits for 100-us precision
Added a table with values calculated from the General Birthday Formula.

Aug 14 '21 18:08 fabiolimace

I vote to leave at 36 in the current implementation

IMO the random part should be as big as possible so keeping 4 leading zero bits is a weird idea.

I think 32 bits even if unsigned are not enough. 2106 is not that far in the future.

Agree. The next option is 33, but the odd number of bits is a bit odd, lol.

UUIDv7-44b-msec (44 bits for MILLIS)

I like it.

UUIDv7-34b-sec (34 bits for seconds)

Is ok too.

Aug 14 '21 19:08 edo1

I played around with the 40-bit timestamps suggested by Brad in another issue.

If we use 40 bits with 10-milliseconds or centisecond precision:

The timestamp, sequence and random parts are aligned with bytes.
The precision is about 20 times shorter than human response times for simple reaction tasks, which are typically on the order of 200 ms. So I think the sort order will be appropriate in chat-like apps where people react to each other's messages.
The database will be happy with this precision (right?). If more precision is needed, the next 20 bits can be used (?), simulating a 10-nanosecond precision.
The random part is much larger than the 48 bits reserved for the node ID in UUIDv1.
The random part needs to be updated 10 times less frequently, consuming less entropy from the underling system, assuming that the random part should change whenever the timestamp changes.

Layout:

|             64 bits             |            64 bits             |
	
|--------------------|----VV------|R-------------------------------|

|    unixts / 100    |  sequence  |            random              |
                       24 - 4 bits
|       40 bits      |  = 20 bits |     64 - 2 bits =  62 bits     |


- : 2 bits
VV: 4 bits for version
R : 2 bits for variant

UUIDv7-40b-10msec (40 bits for 10-MILLIS)
    Time unit: 10-ms
    Time bits: 40
    Time limit: 2318 AD (2**40 / 10**2 / 60 / 60 / 24 / 365.25 + 1970)
    Remaining bits: 82 (128 - 40 - 4 - 2)
    UUIDs per nanosecond: 483570327845851669.8824704 (Decimal(2**82) / Decimal(10**7))
    Proportion: 10.49 (Decimal(483570327845851669.8824704) / Decimal(46116860184273879.04))
                ^^^^^

Let me know what you think and if I made another mistake.

PS: the layout shown here is based on ULID with sequence presented by @sergeyprokhorenko in the issue https://github.com/uuid6/uuid6-ietf-draft/issues/27.

Aug 14 '21 20:08 fabiolimace

If we use 40 bits with 10-milliseconds or centisecond precision:

LGTM

The database will be happy with this precision (right?).

Monotonic UUID (centralized generation only) The resolution doesn't matter at all.
Non-monotonic UUID (both centralized and distributed generation) 2.1. Regular (non-clustered) index IMO 10-ms resolution should be enough in almost any case. On very busy systems, the locality may not be good enough, but the timestamp can be easily extended as you suggest. 2.2. Clustered index IMO there is no silver bullet, but improving timestamp resolution could help.

assuming that the random part should change whenever the timestamp changes.

I don't like this idea. This allows you to predict the next UUID value, which could be a security issue.

Aug 14 '21 23:08 edo1

I think that human response time is an outdated benchmark. We'd better think about data streems from many IoT sourses. Therefore the more unique are timestamp + sequence the better, because incoming UUIDs are more ordered. In most cases we can concider clock sequence empty or null. So the more different timestamps per second the better.

Therefore we'd better take timestamp precision not more than round trip network request in same data centre (500 microseconds). But we have to pay for high precision by increasing the bit depth of the timestamp. So I think the 1 millisecond timestamp precision is enough.

Aug 15 '21 11:08 sergeyprokhorenko

I think that human response time is an outdated benchmark. We'd better think about data streems from many IoT sourses.

Yes. It's important to keep IoT in mind.

I updated my comment https://github.com/uuid6/uuid6-ietf-draft/issues/23#issuecomment-898939437 to include two new timestamp alternatives:

timestamp with 40bits and 10-Milli precision (up to year 2318 AD)
timestamp with 48bits and 100-Micro precision (up to year 2861 AD)

Also included a table with values generated using the General Birthday Formula. All the bits that are not from timestamp are considered as random in that calculation.

Aug 15 '21 18:08 fabiolimace

Also included a table with values generated using the General Birthday Formula.

Unsure that your calculation is correct.

IMO it is better to calculate something like annual collision probability when every nanosecond new UUID is generated.

I use this formula from wiki to calculate probability per tick: With large n we can simplify the probability of no collision to exp(-n^2/2d).

For UUIDv7-40b-10msec n=10e6, d=2^82; the probability of no collision per tick is

The annual (100*3600*24*365 ticks per year) probability of collision is ≈3.2%

For UUID v4 it will be one big tick, n=1e9*3600*24*365, d=2^122; ≈0.009% (a lot better!)

Aug 15 '21 19:08 edo1

Excellent, @edo1!

Now we have a mathematical tool to help us evaluate all the options and trade offs involved. I don't care if my calculations were wrong if it encouraged someone to do something a lot better.

Aug 15 '21 20:08 fabiolimace

Have made a spreadsheet with some UUID variants.

UUID v1, UUID v4, ULID — existing standards;
UUID v7 — this draft, 122 effective bits (6 bits reserved: 4 version + 2 variant); several timestamp resolutions are considered;
UUID variant 3 — similar to the previous one, but the version field is omitted (as mentioned in https://github.com/uuid6/uuid6-ietf-draft/issues/26#issuecomment-899060245), only 2 bits (variant) are reserved;
UUID var3/whole bytes — similar to the previous one, but the length of the timestamp is a multiple of a byte.

In all cases, all effective non-timestamp bits are considered random.

The target is P collisions/year: the annual collision probability when every nanosecond new UUID is generated. Best variants in each group (UUID v7, UUID variant 3, UUID var3/whole bytes) are marked in green.

You could make a copy of the spreadsheet and play around with constants/formulas. Feel free to comment or ask any questions.

Aug 16 '21 01:08 edo1

Great, @edo1 !

We can adjust the life span of the UUIDv7 in the field "years after now", right? It's interesting how the number of years affects the probability of collision. But why don't we use 200 years?

The life span of 120 shows that UUIDv7 with 39 bits and centisecond precision is the best option. But I prefer not using this number of bits to prevent someone storing this type of UUID in a SIGNED integer field. If this occurs, the "last date" becomes 2057, not 2144.

If we adjust the life span to 200 years, the "best" option becomes 40 bits (5 whole bytes), and the last date becomes 2318 (unsigned) or 2144 (signed)(witch is 123 years after now).

Please change the fields "stolen bits" of UUID variant 3 to 3. The bit sequence for this variant is '111'.

Aug 16 '21 10:08 fabiolimace

We can adjust the life span of the UUIDv7 in the field "years after now", right? It's interesting how the number of years affects the probability of collision. But why don't we use 200 years?

You can make your own copy and change anything

Or you can download the spreadsheet in the Excel/OpenOffice format and change the numbers locally.

But I prefer not using this number of bits to prevent someone storing this type of UUID in a SIGNED integer field

Any reason to use signed int here? A 64-bit signed int can be truncated to 39-bit unsigned without any problems.

Please change the fields "stolen bits" of UUID variant 3 to 3. The bit sequence for this variant is '111'.

Oops, thanks for pointing that out

Aug 16 '21 12:08 edo1

Actually, I do not exclude the option of making the epoch even shorter.

last date is probably a bad name. Nothing really bad should happen when an integer overflow occurs. The probability of collisions will increase with each cycle, but with a long timestamp, we will have about the same increased probability right now. Timestamp wrapping is not very good in terms of data sortability and locality, but still, the situation will be much better than with a random UUID

And I don't expect a very long lifespan for 128 bit UUIDs. It is very likely that in 50 or 100 years, 256-bit identifiers (or something radically new) will be used.

Aug 16 '21 13:08 edo1

I think we could start a new epoch from 2021 instead of use of Unix time

Aug 16 '21 20:08 sergeyprokhorenko

And I don't expect a very long lifespan for 128 bit UUIDs. It is very likely that in 50 or 100 years, 256-bit identifiers (or something radically new) will be used.

160-bit identifiers would be great right now. By the way, 160-bit identifiers in Crockford's base32 would be the same length as 128-bit identifiers in usual UUID string format. Therefore they are compartible.

Aug 16 '21 20:08 sergeyprokhorenko

@edo1

[...] The target is P collisions/year: the annual collision probability when every nanosecond new UUID is generated. [...]

Is the spreadsheet correct? The "when every nanosecond new UUID is generated" part in case of ns precision would mean that every nanosecond the time is incremented and because of that the probability of collision is theoretically 0.

Aug 18 '21 18:08 nerg4l

Might as well give them back to the unixts to go to 38 if that is the case.

...

34 bit timestamp, giving 2 bits back to entropy/node/randomness

The way I read RFC4122, it discourages fields that are a non-integral number of octets in length. (I.e. field bit lengths should be a multiple of 4 bits. So 32, 36, 40 are okay. 34 and 38 are not.)

From the RFC:

To minimize confusion about bit assignments within octets, the UUID record definition is defined only in terms of fields that are integral numbers of octets

I know this is just expository text in the RFC, but I believe it's well-founded guidance. Sub-octet field definitions make it difficult for humans to interpret a UUID. Witness how the hex-digit containing the variant field changes based on the clockseq value. 😦

Aug 18 '21 20:08 broofa

Is the spreadsheet correct? The "when every nanosecond new UUID is generated" part in case of ns precision would mean that every nanosecond the time is incremented and because of that the probability of collision is theoretically 0.

You are right. Initially, I assumed that n is large, so the formula does not work correctly in the edge case: n*(n-1)≉n^2 when n=1. Thank you for pointing this out. Fixed.

Also added the second sheet with graphs, they clearly show collision probability mainly depends on bits count and on epoch length, not on timestamp/random split.

Aug 19 '21 02:08 edo1

In https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md, I'm proposing that we use an unsigned 64-bit nano-seconds since the Unix epoch timestamp (and move the version field out of the way - so it's exactly the first 8 bytes). Max timestamp value is 2554-07-21T23:34:33.999999999Z. The rationale is:

Year 2554 is a reasonably long time
Easy to understand and implement (clean byte boundaries)
Doesn't introduce new Epoch (not worth changing to add 51 more years)

Thoughts?

Aug 19 '21 06:08 bradleypeabody

The "when every nanosecond new UUID is generated" part in case of ns precision would mean that every nanosecond the time is incremented and because of that the probability of collision is theoretically 0.

After some thought, I decided to change the model a bit.

It looks more realistic if there are still 10⁹ UUIDs generated per second, but each UUID is generated at a random time within a second. We continue to consider the annual collision probability as an objective function.

The same formula will be used to calculate the probability of NO collisions per second: Ps = exp(-n^2/2d) n = 1e9; d = (1e9/res) * (2^rbits), where res is timestamp resolution in nanoseconds, rbits - random bits count; Ps = exp(-1e9*1e9/2/(1e9/res)/2^rbits) = exp(-1e9*res/2/2^rbits)

Annual NO collision probability: Py = Ps^(3600*24*365) = exp(-1e9*res/2/2^rbits)^(3600*24*365) = exp(-1e9*res*3600*24*365/2/2^rbits)

It would be nice if someone could check that there is no error in the formula.

The spreadsheet has been updated, with almost no changes in numbers, except for 1 ns and 10 ns resolution. Also added collision probability for 20 years and 100 years (columns P collisions/20y and P collisions/100y). UUID proposal from #35 was added as well.

Sep 02 '21 00:09 edo1

Also added collision probability for 20 years and 100 years (columns P collisions/20y and P collisions/100y).

For time based UUIDs the number of years shouldn't matter.

I generate a UUID now: time 2021-09-02 07:13:23 UTC; random tiz5ptms7s
I generate a UUID 20 years from now: time 2041-09-02 07:13:23 UTC; random tiz5ptms7s

The random bit is the same but the time is different there for no collision. You should have the same value for P collisions/year, P collisions/20y, or P collisions/hour up to the resolution of the time.

Correct me if I'm wrong.

Sep 02 '21 07:09 nerg4l

@nerg4l I do not speak English well enough to talk about probability theory, but I'll try to explain with a simple example. Let's say you buy an instant lottery ticket every day. The chance that the purchased ticket will be a winning one is 1: 1,000,000. The ticket must be checked immediately, yesterday's ticket cannot be won today. Will the odds of winning at least once be the same within 1 year and 20 years?

Sep 02 '21 10:09 edo1

Got it, thanks for the explanation.

Sep 02 '21 11:09 nerg4l

uuid6-ietf-draft uuid6-ietf-draft copied to clipboard

Discussion: Unix Timestamp Granularity in UUIDv7

Using the General Birthday Formula

Update (2021-08-15)

uuid6-ietf-draft
uuid6-ietf-draft copied to clipboard