performance of UUID generation is slow – implementation uses `getrandom` syscall
UUID (gen_random_uuid() and uuid_v4()) [1] are currently implemented using crypto/rand. While this provides a strong guarantee wrt security, it's highly inefficient. UUID v4 is a randomly generated 128-bit value; nothing in the spec. [2] dictates that the implementation must use a cryptographically secure PRNG.
The current implementation effectively requires a system call (getrandom) for every new value of gen_random_uuid(). This was originally observed in [3]. I have independently stumbled upon it while straceing an execution of,
./workload fixtures import tpcc --warehouses=100
which induced ~6 million syscalls to getrandom! The reason is we use gen_random_uuid() for automatically generating the rowid column in the history table [4].
Microbenchmarks
Using the existing microbenchmarks BenchmarkMakeV4 and BenchmarkFastMakeV4 on c5.9xlarge (with PMU), we can see that the performance is > 10x slower; cf. ~811 ns/op vs. 54 ns/op.
perf stat -r 3 -C2 taskset -c 2 uuid_test -test.v --test.benchtime=10000000x -test.bench 'BenchmarkMakeV4' -test.run '^$'
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkMakeV4
BenchmarkMakeV4 10000000 811.3 ns/op
PASS
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkMakeV4
BenchmarkMakeV4 10000000 812.2 ns/op
PASS
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkMakeV4
BenchmarkMakeV4 10000000 812.3 ns/op
PASS
Performance counter stats for 'CPU(s) 2' (3 runs):
8128.12 msec cpu-clock # 1.000 CPUs utilized ( +- 0.09% )
48426 context-switches # 5.953 K/sec ( +- 2.29% )
2 cpu-migrations # 0.246 /sec
1976 page-faults # 242.898 /sec ( +- 19.25% )
27698246183 cycles # 3.405 GHz ( +- 0.02% )
32737716985 instructions # 1.18 insn per cycle ( +- 0.03% )
3791218261 branches # 466.032 M/sec ( +- 0.06% )
14589289 branch-misses # 0.38% of all branches ( +- 0.23% )
8.12496 +- 0.00284 seconds time elapsed ( +- 0.03% )
perf stat -r 3 -C2 taskset -c 2 uuid_test -test.v --test.benchtime=10000000x -test.bench 'BenchmarkFastMakeV4' -test.run '^$'
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkFastMakeV4
BenchmarkFastMakeV4 10000000 53.98 ns/op
PASS
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkFastMakeV4
BenchmarkFastMakeV4 10000000 53.98 ns/op
PASS
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkFastMakeV4
BenchmarkFastMakeV4 10000000 53.97 ns/op
PASS
Performance counter stats for 'CPU(s) 2' (3 runs):
544.35 msec cpu-clock # 0.999 CPUs utilized ( +- 2.10% )
254 context-switches # 456.941 /sec ( +- 0.73% )
2 cpu-migrations # 3.598 /sec
798 page-faults # 1.436 K/sec ( +- 0.11% )
1875740389 cycles # 3.374 GHz ( +- 0.03% )
4910290809 instructions # 2.62 insn per cycle ( +- 0.00% )
1013553042 branches # 1.823 G/sec ( +- 0.00% )
168504 branch-misses # 0.02% of all branches ( +- 0.91% )
0.544744 +- 0.000349 seconds time elapsed ( +- 0.06% )
[1] https://www.cockroachlabs.com/docs/stable/uuid [2] https://www.ietf.org/rfc/rfc4122.txt [3] https://github.com/cockroachdb/cockroach/issues/30236#issuecomment-434588162 [4] https://github.com/cockroachdb/cockroach/blob/b24aa96762b06725f6968b4a6129cbcc67818912/pkg/workload/tpcc/ddls.go#L91
Jira issue: CRDB-30899
NOTE: there might be wrong assumptions in the code which assume that the underlying implementation of uuid.MakeV4() will produce zero collisions. One such assumption is in [1]. There are likely others which would need to be audited. Also see the discussion in slack [2].
[1] https://github.com/cockroachdb/cockroach/blob/b24aa96762b06725f6968b4a6129cbcc67818912/pkg/sql/schemachanger/rel/schema_rules.go#L129 [2] https://cockroachlabs.slack.com/archives/C4X2J0RH6/p1692725321047249
@cockroachdb/sql-foundations
there might be wrong assumptions in the code which assume that the underlying implementation of uuid.MakeV4() will produce zero collisions.
We actually intentionally assume collision free UUIDs in some parts of the code. For example, the optimizer will skip certain uniqueness checks if you have a UUID primary key in some cases. @mgartner can describe that better, and also may have an opinion on this issue.
Yes, that's an assumption we make by default for REGIONAL BY ROW tables that have unique keys with a default value of gen_random_uuid() or are updated/inserted with gen_random_uuid(). There's an option to opt out of this optimization if the risk is collisions is too high for a particular use case, but we believe it to be exceedingly unlikely that there will be a collision assuming a proper implementation of the UUID generator. If there's a known flaw, we need to do something about it.
Also note that for PG compat, we may need to keep UUID generation through SQL as cryptographically secure https://security.stackexchange.com/questions/93902/is-postgress-uuid-generate-v4-securely-random/93905#93905
We can make a cluster setting that determines which random source to use for gen_random_uuid.
As of 1.22, math/rand is based on a crypto block cipher, namely ChaCha8Rand [1],
Using Go 1.22, the new ChaCha8Rand generator is seeded from 256 bits of entropy and can generate 2²⁵⁶ possible first UUIDs. It does not need to worry about collisions.
Thus, we should be able to switch to math/rand without further changes, plus gain a non-trivial perf. boost.
[1] https://go.dev/blog/chacha8rand