metabase-clickhouse-driver icon indicating copy to clipboard operation
metabase-clickhouse-driver copied to clipboard

Optimize get list of possible values query for `types/Collection` columns

Open UnamedRus opened this issue 1 year ago • 3 comments

Use approach with settings for such queries:

https://github.com/ClickHouse/ClickHouse/issues/50919#issuecomment-1588985090

UnamedRus avatar Jun 13 '23 13:06 UnamedRus

Related https://github.com/ClickHouse/metabase-clickhouse-driver/issues/119

UnamedRus avatar Jun 13 '23 13:06 UnamedRus

Can you please clarify what needs to be done here cause I am a bit OOTL?

As I understood, it's better to apply these settings for types/Collection scans queries:

max_rows_to_group_by=1000
group_by_overflow_mode=break

Is it correct?

slvrtrn avatar Jun 19 '23 16:06 slvrtrn

Yeah, something like that will be simpler.

It will not return sorted list of values, but just first/random 1000 values.

Another possible problem, what if your column have less then 1000 values, so ClickHouse will scan it fully. (1 trillion rows)

So, it make sense to play with settings:

│ max_rows_to_read                                   │ 0       │       0 │ Limit on read rows from the most 'deep' sources. That is, only in the deepest subquery. When reading from a remote server, it is only checked on a remote server.                                                                                          │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ UInt64 │ 0       │           │
│ max_rows_to_read_leaf                              │ 0       │       0 │ Limit on read rows on the leaf nodes for distributed queries. Limit is applied for local reads only excluding the final merge stage on the root node.                                                                                                      │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ UInt64 │ 0       │     

Or

│ max_execution_time                      │ 0     │       0 │ If query run time exceeded the specified number of seconds, the behavior will be determined by the 'timeout_overflow_mode' which by default is - throw an exception. Note that the timeout is checked and query can stop only in designated places during data processing. It currently cannot stop during merging of aggregation states or during query analysis, and the actual run time will be higher than the value of this setting. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ Seconds │ 0       │           │
│
┌─name──────────────────┬─value─┬─changed─┬─description────────────────────────────┬─min──┬─max──┬─readonly─┬─type─────────┬─default─┬─alias_for─┐
│ timeout_overflow_mode │ throw │       0 │ What to do when the limit is exceeded. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │        0 │ OverflowMode │ throw   │           │
└───────────────────────┴───────┴─────────┴────────────────────────────────────────┴──────┴──────┴──────────┴──────────────┴─────────┴───────────┘
SET  timeout_overflow_mode = 'break', max_execution_time=5

UnamedRus avatar Jun 19 '23 16:06 UnamedRus