ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

Less locking in generateUUIDv7 #64507

Open JosuGZ opened this issue 1 year ago • 4 comments

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Performance improvement while generating UUIDv7

JosuGZ avatar Jul 31 '24 22:07 JosuGZ

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 31 '24 22:07 CLAassistant

This is an automated comment for commit f6c2966e1643c292075f2e9bd20455876a6ee041 with description of existing statuses. It's updated for the latest CI running

✅ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success

robot-ch-test-poll2 avatar Aug 01 '24 22:08 robot-ch-test-poll2

@JosuGZ Before I proceed with a review, please sign the CLA first, thanks.

rschu1ze avatar Aug 05 '24 10:08 rschu1ze

@JosuGZ Before I proceed with a review, please sign the CLA first, thanks.

Done!

JosuGZ avatar Aug 05 '24 11:08 JosuGZ

@rschu1ze do you need some comments to clarify anything?

The main difference with the snowflakes is that the following overflow can't be done on UUIDv7 as increasing the timestamp means generating a new random counter. That means the range would have a random end.

Instead, I add "capacity" to the range, which shows how many UUIDs are can be tenerated with the range. That way, new ranges can be reserved if capacity is smaller than input_rows_count.

image

JosuGZ avatar Aug 12 '24 12:08 JosuGZ

Everyone is still focussing on fixing the CI (stabilizing it), please expect some delay, sorry.

rschu1ze avatar Aug 12 '24 19:08 rschu1ze

Everyone is still focussing on fixing the CI (stabilizing it), please expect some delay, sorry.

No problem, good luck with that!

JosuGZ avatar Aug 13 '24 11:08 JosuGZ

Dear @rschu1ze, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

clickhouse-gh[bot] avatar Sep 17 '24 13:09 clickhouse-gh[bot]

Hi @rschu1ze, I have rebased the branch on top of master, can you take a look?

JosuGZ avatar Oct 02 '24 05:10 JosuGZ

@JosuGZ Sorry the code might be faster now but it became so difficult to understand that I can't verify it. If you like to continue, feel free to but the PR would need to be simplified somehow.

rschu1ze avatar Oct 14 '24 20:10 rschu1ze

I would appreciate knowing the part that you consider difficult to understand before I try to change anything. I tried to keep the code as close as possible to the snowflake code. Is it the while loop?

On snowflakes, the range looks like this:

image

However, on UUIDv7, following what the original code does (and the recommendations about UUIDv7), the range has holes, as updating the timestamp means initializing the counter to a random value. It looks like this:

image

That means that we can't just iterate from begin to end, so we have 3 options:

  1. Store the information about the holes on the range. This seems complicated.
  2. Assume that we will never have an overflow. With 42 bits this can be done, for example setting the first bit to 0 when generating the random value. This could break should the counter length be updated.
  3. Reserve a range that may have less capacity than input_rows_count, and keep reserving ranges until we process all the rows.

My approach is the last one, and I have tested the code forcing the counter length to be 5 bits (to test the overflow). This approach also has the benefit of removing the overflow check inside the inner loop.

JosuGZ avatar Oct 15 '24 10:10 JosuGZ