Skip useless values in primary key in memory.
Suppose there is primary key (x, y) when x is arbitrary column and y is variable length column like String.
When loading index in memory, we can avoid storing actual value of y if the value of x is different from the value in previous and next mark. We can store default value instead.
Example:
Instead of:
... (123, 'hello'), (124, 'world'), (125, 'goodbye') ...
We can store:
... (123, 'hello'), (124, ''), (125, 'goodbye') ...
Because 124 is not equal to 123 and 125.
And the actual value of y column is not important.
@CurtizJ RFC.
What will happen, if we run query like this: SELECT * FROM table WHERE x = 124 and y = 'zoo' Isn't it would scan 2 granules instead of 1?
@UnamedRus You are right, we need some modification to this scheme.
What will happen, if we run query like this: SELECT * FROM table WHERE x = 124 and y = 'zoo' Isn't it would scan 2 granules instead of 1?
@UnamedRus How does y = 'zoo' make a difference? IIUC, zoo > '' and zoo > 'world'
Because in case we skip useless values: default value doesn't mean that it's actually default, it does mean that anything can be here:
If we have query like that: SELECT * FROM table WHERE x = 124 and y = 'zoo'
And marks like that:
0 - 123, hello
1 - 124, world
2 - 125, goodbye
Without skipping it will read granule 1-2.
With skipping it need's to read granules 0-1 and 1-2, because it's possible that instead of world we had zoo value in that mark.
#60091
This is proven to be wrong. But see https://github.com/ClickHouse/ClickHouse/issues/60091