cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

performance of UUID generation is slow – implementation uses `getrandom` syscall

Open srosenberg opened this issue 2 years ago • 7 comments

UUID (gen_random_uuid() and uuid_v4()) [1] are currently implemented using crypto/rand. While this provides a strong guarantee wrt security, it's highly inefficient. UUID v4 is a randomly generated 128-bit value; nothing in the spec. [2] dictates that the implementation must use a cryptographically secure PRNG.

The current implementation effectively requires a system call (getrandom) for every new value of gen_random_uuid(). This was originally observed in [3]. I have independently stumbled upon it while straceing an execution of,

./workload fixtures import tpcc  --warehouses=100

which induced ~6 million syscalls to getrandom! The reason is we use gen_random_uuid() for automatically generating the rowid column in the history table [4].

Microbenchmarks

Using the existing microbenchmarks BenchmarkMakeV4 and BenchmarkFastMakeV4 on c5.9xlarge (with PMU), we can see that the performance is > 10x slower; cf. ~811 ns/op vs. 54 ns/op.

 perf stat -r 3 -C2 taskset -c 2 uuid_test -test.v --test.benchtime=10000000x -test.bench 'BenchmarkMakeV4' -test.run '^$'
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkMakeV4
BenchmarkMakeV4 	10000000	       811.3 ns/op
PASS
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkMakeV4
BenchmarkMakeV4 	10000000	       812.2 ns/op
PASS
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkMakeV4
BenchmarkMakeV4 	10000000	       812.3 ns/op
PASS

 Performance counter stats for 'CPU(s) 2' (3 runs):

           8128.12 msec cpu-clock                 #    1.000 CPUs utilized            ( +-  0.09% )
             48426      context-switches          #    5.953 K/sec                    ( +-  2.29% )
                 2      cpu-migrations            #    0.246 /sec
              1976      page-faults               #  242.898 /sec                     ( +- 19.25% )
       27698246183      cycles                    #    3.405 GHz                      ( +-  0.02% )
       32737716985      instructions              #    1.18  insn per cycle           ( +-  0.03% )
        3791218261      branches                  #  466.032 M/sec                    ( +-  0.06% )
          14589289      branch-misses             #    0.38% of all branches          ( +-  0.23% )

           8.12496 +- 0.00284 seconds time elapsed  ( +-  0.03% )
perf stat -r 3 -C2 taskset -c 2 uuid_test -test.v --test.benchtime=10000000x -test.bench 'BenchmarkFastMakeV4' -test.run '^$'
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkFastMakeV4
BenchmarkFastMakeV4 	10000000	        53.98 ns/op
PASS
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkFastMakeV4
BenchmarkFastMakeV4 	10000000	        53.98 ns/op
PASS
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
BenchmarkFastMakeV4
BenchmarkFastMakeV4 	10000000	        53.97 ns/op
PASS

 Performance counter stats for 'CPU(s) 2' (3 runs):

            544.35 msec cpu-clock                 #    0.999 CPUs utilized            ( +-  2.10% )
               254      context-switches          #  456.941 /sec                     ( +-  0.73% )
                 2      cpu-migrations            #    3.598 /sec
               798      page-faults               #    1.436 K/sec                    ( +-  0.11% )
        1875740389      cycles                    #    3.374 GHz                      ( +-  0.03% )
        4910290809      instructions              #    2.62  insn per cycle           ( +-  0.00% )
        1013553042      branches                  #    1.823 G/sec                    ( +-  0.00% )
            168504      branch-misses             #    0.02% of all branches          ( +-  0.91% )

          0.544744 +- 0.000349 seconds time elapsed  ( +-  0.06% )

[1] https://www.cockroachlabs.com/docs/stable/uuid [2] https://www.ietf.org/rfc/rfc4122.txt [3] https://github.com/cockroachdb/cockroach/issues/30236#issuecomment-434588162 [4] https://github.com/cockroachdb/cockroach/blob/b24aa96762b06725f6968b4a6129cbcc67818912/pkg/workload/tpcc/ddls.go#L91

Jira issue: CRDB-30899

srosenberg avatar Aug 24 '23 05:08 srosenberg

NOTE: there might be wrong assumptions in the code which assume that the underlying implementation of uuid.MakeV4() will produce zero collisions. One such assumption is in [1]. There are likely others which would need to be audited. Also see the discussion in slack [2].

[1] https://github.com/cockroachdb/cockroach/blob/b24aa96762b06725f6968b4a6129cbcc67818912/pkg/sql/schemachanger/rel/schema_rules.go#L129 [2] https://cockroachlabs.slack.com/archives/C4X2J0RH6/p1692725321047249

srosenberg avatar Aug 24 '23 05:08 srosenberg

@cockroachdb/sql-foundations

srosenberg avatar Aug 24 '23 05:08 srosenberg

there might be wrong assumptions in the code which assume that the underlying implementation of uuid.MakeV4() will produce zero collisions.

We actually intentionally assume collision free UUIDs in some parts of the code. For example, the optimizer will skip certain uniqueness checks if you have a UUID primary key in some cases. @mgartner can describe that better, and also may have an opinion on this issue.

rafiss avatar Aug 24 '23 13:08 rafiss

Yes, that's an assumption we make by default for REGIONAL BY ROW tables that have unique keys with a default value of gen_random_uuid() or are updated/inserted with gen_random_uuid(). There's an option to opt out of this optimization if the risk is collisions is too high for a particular use case, but we believe it to be exceedingly unlikely that there will be a collision assuming a proper implementation of the UUID generator. If there's a known flaw, we need to do something about it.

mgartner avatar Aug 28 '23 21:08 mgartner

Also note that for PG compat, we may need to keep UUID generation through SQL as cryptographically secure https://security.stackexchange.com/questions/93902/is-postgress-uuid-generate-v4-securely-random/93905#93905

rafiss avatar Sep 13 '23 19:09 rafiss

We can make a cluster setting that determines which random source to use for gen_random_uuid.

rafiss avatar Sep 19 '23 18:09 rafiss

As of 1.22, math/rand is based on a crypto block cipher, namely ChaCha8Rand [1],

Using Go 1.22, the new ChaCha8Rand generator is seeded from 256 bits of entropy and can generate 2²⁵⁶ possible first UUIDs. It does not need to worry about collisions.

Thus, we should be able to switch to math/rand without further changes, plus gain a non-trivial perf. boost.

[1] https://go.dev/blog/chacha8rand

srosenberg avatar May 08 '24 02:05 srosenberg