uuid6-ietf-draft
uuid6-ietf-draft copied to clipboard
Discussion: Unix Timestamp Granularity in UUIDv7
Question: Too many or too much?
Current Draft-02 implementation:
- 36 bits to avoid the year 2038 problem.
- If 32 bit timestamp, pad left/start of 32 bit to create 36 bit value in the UUIDv7
- if 64 bit timestamp, truncate to last, right-most 36 bits for use in UUIDv7
- 36 bits provides timestamp values up to the year 4147
Alternatives Proposed:
- 34 bit timestamp, giving 2 bits back to entropy/node/randomness, valid up to the year 2514 source
I vote to leave at 36 in the current implementation, I would rather future proof as much as possible with a larger number than reduce the size and truncate a 64-bit timestamp more.
That being said: Depending on how we choose to encode subsec_a
based on #24 we may end up with two more additional bits that are always padded. Might as well give them back to the unixts
to go to 38 if that is the case.
Yes, the 36-bit length was definitely done intentionally in order to avoid running out of numbers - but I do agree it's annoying to use.
Another option would be to use a 32-bit unsigned integer, which runs out in the year 2106. I'm not super excited about an end date that is close enough for my grandchildren to curse me for when it runs out on them, but it's an option and would definitely be simpler to use.
I'll write more on this on #24 in a moment.
Bits | Signedness | End year |
---|---|---|
32 | Signed | 2038 |
32 | Unsigned | 2106 |
33 | Signed | 2106 |
33 | Unsigned | 2242 |
34 | Signed | 2242 |
34 | Unsigned | 2514 |
35 | Signed | 2514 |
35 | Unsigned | 3058 |
36 | Signed | 3058 |
36 | Unsigned | 4147 |
I think 32 bits even if unsigned are not enough. 2106 is not that far in the future.
I can also see the benefit of keeping the value unsigned because generating a UUID before 1970-01-01 shouldn't be a real life use case.
I definitely want to avoid introducing a new epoch. The Unix epoch makes implementations much easier. I don't even know why did they decide to use the Gregorian epoch in case of UUIDv1.
For me anything from 2514 looks good.
@nerg4l @kyzer-davis If we did the 111 var field from #26 for UUID7, that gives us the first 64 bits clear and free of any other requirements. What if we, despite how clever I thought I was being with this subsecond encoding stuff (I still like the idea but I don't like how much confusion it has created), instead just use a 64-bit nanosecond-precise unix timestamp. I'm not sure if this shoudl replace UUIDv7 or maybe be a separate version (we could do UUID9 ;) ). But regardless, I'm just trying to see if there is a way to simplify this whole thing down so when people read the spec it's like "oh of course, that makes sense" instead of "what the..."
So the new bit layout would be:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_nano_timestamp_u64 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| unix_nano_timestamp_u64 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| var | ver | seq | rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rand |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Where unix_nano_timestamp_u64
is an unsigned 64-bit unix-like timestamp (but instead of seconds since the epoch it's nanoseconds since the epoch), seq
is 8 bits for sequence and rand
is random. Max time with this is 2554-07-21 23:34:33 UTC
And then we basically just say basically "implementations can parse out the time if the they like, but the other fields are essentially write-only and are a recommendation - don't try to read these because it's not invalid for implementations make changes after the ver-var field. And factually if implementations wanted to encode random stuff into the least significant bytes of the nanosecond-precise timeestamp - I don't think that should even be explicitly forbidden - it's just like "if you do this, other implementations may get the wrong timestamp - use at your own risk".
I think this would be dramatically simpler and easier to understand.
@bradleypeabody I think the variable subsecond handling in UUIDv7 was added because not all languages can return unix timestamp with nanosecond precision eg. JavaScript, PHP.
Changing the position of the version bit would affect some wildly used libraries. Here are some examples:
- Java affected: https://github.com/openjdk/jdk8/blob/6a383433a9f4661a96a90b2a4c7b5b9a85720031/jdk/src/share/classes/java/util/UUID.java#L245-L248
- Google UUID module for Go affected: https://github.com/google/uuid/blob/44b5fee7c49cf3bcdf723f106b36d56ef13ccc88/uuid.go#L227-L229
- Python not affected: https://github.com/python/cpython/blob/main/Lib/uuid.py#L354-L357
- ramsey UUID package for PHP not affected because parsing fails: https://github.com/ramsey/uuid/blob/fe665a03df4f056aa65af552a96e1976df8c8dae/src/Rfc4122/Fields.php#L186-L193
- uuidjs UUID package for JavaScript affected (
validate
checks for the correctness of the string): https://github.com/uuidjs/uuid/blob/91805f665c38b691ac2cbda56a99231432b00a1a/src/version.js#L3-L9
Edit: It is always quite unfortunate when there are already existing implementations.
Another important thing to keep in mind is how many UUIDs can be generated per unit of time. The more of a type of UUID that can be generated in a given interval, the better.
I did some simple math to get the maximum number of UUIDs version 1 that can be generated in a nanosecond interval. I treated all non-timestamp bits as random bits and obtained the value 461168601842738816 (2**62/10). So I used this number as a reference for comparison with the other UUID types. A fraction called "proportion" is used to express the comparison numerically.
These are the proportions:
-
UUIDv1: 1.00 (reference) -
COMBv4: 0.04 -
UUIDv7-34b-sec: 0.67 -
UUIDv7-36b-sec: 0.17 -
UUIDv7-44b-msec: 0.66 -
ULID: 2.62 (cheats using 6 extra bits)
I was about to propose an alternative UUIDv7 with 44 bits for timestamp and millisecond precision. But when I did that calculation, I saw how difficult it is to find another layout that beats UUIDv1.
This are the maths:
UUIDv1
Time unit: 100-ns
Time bits: 60
Time limit: 5623 AD (260 / 107 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 62 (128-60-4-2)
UUIDs per nanosecond: 461168601842738816 (262 / 10)
Proportion: 1.00 (used as reference)
COMBv4 (with version and variant bits):
Time unit: ms
Time bits: 48
Time limit: 10889 AD (248 / 1000 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 74 (128 - 48 - 4 - 2)
UUIDs per nanosecond: 18889465931478580 (274 / 106)
Proportion: 0.04 (18889465931478580 / 461168601842738816)
UUIDv7-34b-sec (34 bits for seconds)
Time unit: s
Time bits: 34
Time limit: 2514 AD (234 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 88 (128 - 34 - 4 - 2)
UUIDs per nanosecond: 309485009821345088 (288 / 109)
Proportion: 0.67 (309485009821345088 / 461168601842738816)
UUIDv7-36b-sec (36 bits for seconds)
Time unit: s
Time bits: 36
Time limit: 4147 AD (236 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 86 (128 - 36 - 4 - 2)
UUIDs per nanosecond: 77371252455336272 (286 / 109)
Proportion: 0.17 (77371252455336272 / 461168601842738816)
UUIDv7-44b-msec (44 bits for MILLIS)
Time unit: ms
Time bits: 44
Time limit: 2527 AD (244 / 1000 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 78 (128 - 44 - 4 - 2)
UUIDs per nanosecond: 302231454903657280 (278 / 106)
Proportion: 0.66 (302231454903657280 / 461168601842738816)
ULID (no version or variant bits):
Time unit: ms
Time bits: 48
Time limit: 10889 AD (248 / 1000 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 80 (128 - 48)
UUIDs per nanosecond: 1208925819614629120 (280 / 106)
Proportion: 2.62 (1208925819614629120 / 461168601842738816)
UUIDs per nanosecond: 461168601842738816 (2**62 / 10)
Wouldn't it be
UUIDs per nanosecond: 46116860184273879 (2**62 / 100)
?
BTW, your calculator is wrong a bit. https://play.golang.org/p/B-rzKVAJhNE
@edo1
Yes. I made a silly mistake in the beginning. All the rest of the calculation is wrong! I will fix this.
Thanks for noticing!
Now any UUIDv7 that we have talked about is better than UUIDv1. I use Python's Decimal()
here to avoid division errors.
- UUIDv1: 1.00 (reference)
- COMBv4: 0.41
- UUIDv7-34b-sec: 6.71
- UUIDv7-36b-sec: 1.68
- UUIDv7-44b-msec: 6.55
- ULID: 26.21 (cheats using 6 extra bits)
- UUIDv7-40b-10msec: 10.49
- UUIDv7-48b-100msec: 4.10
Note: in this simple (naive?) calculation, all bits that are not timestamps are considered random.
UUIDv1
Time unit: 100-ns
Time bits: 60
Time limit: 5623 AD (2**60 / 10**7 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 62 (128-60-4-2)
UUIDs per nanosecond: 46116860184273879.04 (Decimal(2**62) / Decimal(100))
Proportion: 1.00 (used as reference)
COMBv4 (with version and variant bits):
Time unit: ms
Time bits: 48
Time limit: 10889 AD (2**48 / 1000 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 74 (128 - 48 - 4 - 2)
UUIDs per nanosecond: 18889465931478580.854784 (Decimal(2**74) / Decimal(10**6))
Proportion: 0.41 (Decimal(18889465931478580.854784) / Decimal(46116860184273879.04))
UUIDv7-34b-sec (34 bits for seconds)
Time unit: s
Time bits: 34
Time limit: 2514 AD (2**34 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 88 (128 - 34 - 4 - 2)
UUIDs per nanosecond: 309485009821345068.724781056 (Decimal(2**88) / Decimal(10**9))
Proportion: 6.71 (Decimal(309485009821345068.724781056) / Decimal(46116860184273879.04))
UUIDv7-36b-sec (36 bits for seconds)
Time unit: s
Time bits: 36
Time limit: 4147 AD (2**36 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 86 (128 - 36 - 4 - 2)
UUIDs per nanosecond: 77371252455336267.181195264 (Decimal(2**86) / Decimal(10**9))
Proportion: 1.68 (Decimal(77371252455336267.181195264) / Decimal(46116860184273879.04))
UUIDv7-44b-msec (44 bits for MILLIS)
Time unit: ms
Time bits: 44
Time limit: 2527 AD (2**44 / 1000 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 78 (128 - 44 - 4 - 2)
UUIDs per nanosecond: 302231454903657293.676544 (Decimal(2**78) / Decimal(10**6))
Proportion: 6.55 (Decimal(302231454903657293.676544) / Decimal(46116860184273879.04))
ULID (no version or variant bits):
Time unit: ms
Time bits: 48
Time limit: 10889 AD (2**48 / 1000 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 80 (128 - 48)
UUIDs per nanosecond: 1208925819614629174.706176 (Decimal(2**80) / Decimal(10**6))
Proportion: 26.21 (Decimal(1208925819614629174.706176) / Decimal(46116860184273879.04))
UUIDv7-40b-10msec (40 bits for 10-MILLIS)
Time unit: 10-ms
Time bits: 40
Time limit: 2318 AD (2**40 / 10**2 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 82 (128 - 40 - 4 - 2)
UUIDs per nanosecond: 483570327845851669.8824704 (Decimal(2**82) / Decimal(10**7))
Proportion: 10.49 (Decimal(483570327845851669.8824704) / Decimal(46116860184273879.04))
UUIDv7-48b-100usec (48 bits for 100-MICROS)
Time unit: 100-us
Time bits: 48
Time limit: 2861 AD (2**48 / 10**4 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 74 (128 - 48 - 4 - 2)
UUIDs per nanosecond: 188894659314785808.54784 (Decimal(2**74) / Decimal(10**5))
Proportion: 4.10 (Decimal(188894659314785808.54784) / Decimal(46116860184273879.04))
Using the General Birthday Formula
The General Birthday Formula applied to the UUID types:
UUID TYPE UUIDS PER NANOSECONDS GENERAL BIRTHDAY FORMULA
UUIDv1 Decimal(2**62) / Decimal(10**2) 252846877.03432938
COMBv4 Decimal(2**74) / Decimal(10**6) 161822001.30197080
UUIDv7-34b-sec Decimal(2**88) / Decimal(10**9) 655009407.54042970
UUIDv7-36b-sec Decimal(2**86) / Decimal(10**9) 327504703.77021486
UUIDv7-44b-msec Decimal(2**78) / Decimal(10**6) 647288005.20788320
ULID Decimal(2**80) / Decimal(10**6) 1294576010.41576650
UUIDv7-40b-10msec Decimal(2**82) / Decimal(10**7) 818761759.42553700
UUIDv7-48b-100usec Decimal(2**74) / Decimal(10**5) 511726099.64096063
The General Birthday Formula:
import math
from decimal import Decimal
def birthday_formula(t):
return Decimal(math.sqrt(-2 * math.log(1-1/2)) * math.sqrt(t))
# Example on terminal
>>> uuid1 = Decimal(2**62) / Decimal(10**2)
>>> birthday_formula(uuid1)
Decimal('252846877.0343293845653533935546875')
The formula calculates the number of UUIDs which need to be generated in order to have a 50% probability of at least one collision.
Sources:
- https://betterexplained.com/articles/understanding-the-birthday-paradox
- https://en.wikipedia.org/wiki/Universally_unique_identifier
Update (2021-08-15)
- Added the UUIDv7 with 40 bits for 10-ms precision
- Added the UUIDv7 with 48 bits for 100-us precision
- Added a table with values calculated from the General Birthday Formula.
I vote to leave at 36 in the current implementation
IMO the random part should be as big as possible so keeping 4 leading zero bits is a weird idea.
I think 32 bits even if unsigned are not enough. 2106 is not that far in the future.
Agree. The next option is 33, but the odd number of bits is a bit odd, lol.
UUIDv7-44b-msec (44 bits for MILLIS)
I like it.
UUIDv7-34b-sec (34 bits for seconds)
Is ok too.
I played around with the 40-bit timestamps suggested by Brad in another issue.
If we use 40 bits with 10-milliseconds or centisecond precision:
- The timestamp, sequence and random parts are aligned with bytes.
- The precision is about 20 times shorter than human response times for simple reaction tasks, which are typically on the order of 200 ms. So I think the sort order will be appropriate in chat-like apps where people react to each other's messages.
- The database will be happy with this precision (right?). If more precision is needed, the next 20 bits can be used (?), simulating a 10-nanosecond precision.
- The random part is much larger than the 48 bits reserved for the node ID in UUIDv1.
- The random part needs to be updated 10 times less frequently, consuming less entropy from the underling system, assuming that the random part should change whenever the timestamp changes.
Layout:
| 64 bits | 64 bits |
|--------------------|----VV------|R-------------------------------|
| unixts / 100 | sequence | random |
24 - 4 bits
| 40 bits | = 20 bits | 64 - 2 bits = 62 bits |
- : 2 bits
VV: 4 bits for version
R : 2 bits for variant
UUIDv7-40b-10msec (40 bits for 10-MILLIS)
Time unit: 10-ms
Time bits: 40
Time limit: 2318 AD (2**40 / 10**2 / 60 / 60 / 24 / 365.25 + 1970)
Remaining bits: 82 (128 - 40 - 4 - 2)
UUIDs per nanosecond: 483570327845851669.8824704 (Decimal(2**82) / Decimal(10**7))
Proportion: 10.49 (Decimal(483570327845851669.8824704) / Decimal(46116860184273879.04))
^^^^^
Let me know what you think and if I made another mistake.
PS: the layout shown here is based on ULID with sequence presented by @sergeyprokhorenko in the issue https://github.com/uuid6/uuid6-ietf-draft/issues/27.
If we use 40 bits with 10-milliseconds or centisecond precision:
LGTM
The database will be happy with this precision (right?).
- Monotonic UUID (centralized generation only) The resolution doesn't matter at all.
- Non-monotonic UUID (both centralized and distributed generation) 2.1. Regular (non-clustered) index IMO 10-ms resolution should be enough in almost any case. On very busy systems, the locality may not be good enough, but the timestamp can be easily extended as you suggest. 2.2. Clustered index IMO there is no silver bullet, but improving timestamp resolution could help.
assuming that the random part should change whenever the timestamp changes.
I don't like this idea. This allows you to predict the next UUID value, which could be a security issue.
I think that human response time is an outdated benchmark. We'd better think about data streems from many IoT sourses. Therefore the more unique are timestamp + sequence the better, because incoming UUIDs are more ordered. In most cases we can concider clock sequence empty or null. So the more different timestamps per second the better.
Therefore we'd better take timestamp precision not more than round trip network request in same data centre (500 microseconds). But we have to pay for high precision by increasing the bit depth of the timestamp. So I think the 1 millisecond timestamp precision is enough.
I think that human response time is an outdated benchmark. We'd better think about data streems from many IoT sourses.
Yes. It's important to keep IoT in mind.
I updated my comment https://github.com/uuid6/uuid6-ietf-draft/issues/23#issuecomment-898939437 to include two new timestamp alternatives:
- timestamp with 40bits and 10-Milli precision (up to year 2318 AD)
- timestamp with 48bits and 100-Micro precision (up to year 2861 AD)
Also included a table with values generated using the General Birthday Formula. All the bits that are not from timestamp are considered as random in that calculation.
Also included a table with values generated using the General Birthday Formula.
Unsure that your calculation is correct.
IMO it is better to calculate something like annual collision probability when every nanosecond new UUID is generated.
I use this formula from wiki to calculate probability per tick:
With large
n
we can simplify the probability of no collision to exp(-n^2/2d)
.
For UUIDv7-40b-10msec n=10e6, d=2^82
; the probability of no collision per tick is
The annual (100*3600*24*365
ticks per year) probability of collision is
≈3.2%
For UUID v4 it will be one big tick, n=1e9*3600*24*365, d=2^122
;
≈0.009% (a lot better!)
Excellent, @edo1!
Now we have a mathematical tool to help us evaluate all the options and trade offs involved. I don't care if my calculations were wrong if it encouraged someone to do something a lot better.
Have made a spreadsheet with some UUID variants.
-
UUID v1
,UUID v4
,ULID
— existing standards; -
UUID v7
— this draft, 122 effective bits (6 bits reserved: 4 version + 2 variant); several timestamp resolutions are considered; -
UUID variant 3
— similar to the previous one, but the version field is omitted (as mentioned in https://github.com/uuid6/uuid6-ietf-draft/issues/26#issuecomment-899060245), only 2 bits (variant) are reserved; -
UUID var3/whole bytes
— similar to the previous one, but the length of the timestamp is a multiple of a byte.
In all cases, all effective non-timestamp bits are considered random.
The target is P collisions/year
: the annual collision probability when every nanosecond new UUID is generated. Best variants in each group (UUID v7
, UUID variant 3
, UUID var3/whole bytes
) are marked in green.
You could make a copy of the spreadsheet and play around with constants/formulas. Feel free to comment or ask any questions.
Great, @edo1 !
We can adjust the life span of the UUIDv7 in the field "years after now", right? It's interesting how the number of years affects the probability of collision. But why don't we use 200 years?
The life span of 120 shows that UUIDv7 with 39 bits and centisecond precision is the best option. But I prefer not using this number of bits to prevent someone storing this type of UUID in a SIGNED integer field. If this occurs, the "last date" becomes 2057, not 2144.
If we adjust the life span to 200 years, the "best" option becomes 40 bits (5 whole bytes), and the last date becomes 2318 (unsigned) or 2144 (signed)(witch is 123 years after now).
Please change the fields "stolen bits" of UUID variant 3 to 3. The bit sequence for this variant is '111'.
We can adjust the life span of the UUIDv7 in the field "years after now", right? It's interesting how the number of years affects the probability of collision. But why don't we use 200 years?
You can make your own copy and change anything
Or you can download the spreadsheet in the Excel/OpenOffice format and change the numbers locally.
But I prefer not using this number of bits to prevent someone storing this type of UUID in a SIGNED integer field
Any reason to use signed int here? A 64-bit signed int can be truncated to 39-bit unsigned without any problems.
Please change the fields "stolen bits" of UUID variant 3 to 3. The bit sequence for this variant is '111'.
Oops, thanks for pointing that out
Actually, I do not exclude the option of making the epoch even shorter.
last date
is probably a bad name. Nothing really bad should happen when an integer overflow occurs.
The probability of collisions will increase with each cycle, but with a long timestamp, we will have about the same increased probability right now.
Timestamp wrapping is not very good in terms of data sortability and locality, but still, the situation will be much better than with a random UUID
And I don't expect a very long lifespan for 128 bit UUIDs. It is very likely that in 50 or 100 years, 256-bit identifiers (or something radically new) will be used.
I think we could start a new epoch from 2021 instead of use of Unix time
And I don't expect a very long lifespan for 128 bit UUIDs. It is very likely that in 50 or 100 years, 256-bit identifiers (or something radically new) will be used.
160-bit identifiers would be great right now. By the way, 160-bit identifiers in Crockford's base32 would be the same length as 128-bit identifiers in usual UUID string format. Therefore they are compartible.
@edo1
[...] The target is
P collisions/year
: the annual collision probability when every nanosecond new UUID is generated. [...]
Is the spreadsheet correct? The "when every nanosecond new UUID is generated" part in case of ns precision would mean that every nanosecond the time is incremented and because of that the probability of collision is theoretically 0.
Might as well give them back to the unixts to go to 38 if that is the case.
...
- 34 bit timestamp, giving 2 bits back to entropy/node/randomness
The way I read RFC4122, it discourages fields that are a non-integral number of octets in length. (I.e. field bit lengths should be a multiple of 4 bits. So 32, 36, 40 are okay. 34 and 38 are not.)
From the RFC:
To minimize confusion about bit assignments within octets, the UUID record definition is defined only in terms of fields that are integral numbers of octets
I know this is just expository text in the RFC, but I believe it's well-founded guidance. Sub-octet field definitions make it difficult for humans to interpret a UUID. Witness how the hex-digit containing the variant
field changes based on the clockseq
value. 😦
Is the spreadsheet correct? The "when every nanosecond new UUID is generated" part in case of ns precision would mean that every nanosecond the time is incremented and because of that the probability of collision is theoretically 0.
You are right. Initially, I assumed that n
is large, so the formula does not work correctly in the edge case: n*(n-1)≉n^2
when n=1
.
Thank you for pointing this out. Fixed.
Also added the second sheet with graphs, they clearly show collision probability mainly depends on bits count and on epoch length, not on timestamp/random split.
In https://github.com/uuid6/uuid6-ietf-draft/blob/master/LATEST.md, I'm proposing that we use an unsigned 64-bit nano-seconds since the Unix epoch timestamp (and move the version field out of the way - so it's exactly the first 8 bytes). Max timestamp value is 2554-07-21T23:34:33.999999999Z. The rationale is:
- Year 2554 is a reasonably long time
- Easy to understand and implement (clean byte boundaries)
- Doesn't introduce new Epoch (not worth changing to add 51 more years)
Thoughts?
The "when every nanosecond new UUID is generated" part in case of ns precision would mean that every nanosecond the time is incremented and because of that the probability of collision is theoretically 0.
After some thought, I decided to change the model a bit.
It looks more realistic if there are still 10⁹ UUIDs generated per second, but each UUID is generated at a random time within a second. We continue to consider the annual collision probability as an objective function.
The same formula will be used to calculate the probability of NO collisions per second:
Ps = exp(-n^2/2d)
n = 1e9; d = (1e9/res) * (2^rbits)
, where res
is timestamp resolution in nanoseconds, rbits
- random bits count;
Ps = exp(-1e9*1e9/2/(1e9/res)/2^rbits) = exp(-1e9*res/2/2^rbits)
Annual NO collision probability:
Py = Ps^(3600*24*365) = exp(-1e9*res/2/2^rbits)^(3600*24*365) = exp(-1e9*res*3600*24*365/2/2^rbits)
It would be nice if someone could check that there is no error in the formula.
The spreadsheet has been updated, with almost no changes in numbers, except for 1 ns
and 10 ns
resolution.
Also added collision probability for 20 years and 100 years (columns P collisions/20y
and P collisions/100y
).
UUID proposal from #35 was added as well.
Also added collision probability for 20 years and 100 years (columns
P collisions/20y
andP collisions/100y
).
For time based UUIDs the number of years shouldn't matter.
- I generate a UUID now: time 2021-09-02 07:13:23 UTC; random tiz5ptms7s
- I generate a UUID 20 years from now: time 2041-09-02 07:13:23 UTC; random tiz5ptms7s
The random bit is the same but the time is different there for no collision. You should have the same value for P collisions/year
, P collisions/20y
, or P collisions/hour
up to the resolution of the time.
Correct me if I'm wrong.
@nerg4l I do not speak English well enough to talk about probability theory, but I'll try to explain with a simple example. Let's say you buy an instant lottery ticket every day. The chance that the purchased ticket will be a winning one is 1: 1,000,000. The ticket must be checked immediately, yesterday's ticket cannot be won today. Will the odds of winning at least once be the same within 1 year and 20 years?
Got it, thanks for the explanation.