databend
databend copied to clipboard
[improvement] Settings hint is hard to understand under complex queries
Summary
From the user's perspective, this can even be considered incorrect behavior. We need to inject the settings hint into the plan so that the behavior is consistent with intuition.
root@localhost:8000/default> select * from (select /*+SET_VAR(timezone='Asia/Shanghai')*/ name, value value from system.settings where name in ('timezone') union all (select /*+SET_VAR(timezone='America/Los_Angeles')*/ 'x','x')) order by name desc;
SELECT
*
FROM
(
SELECT
/*+SET_VAR(timezone='Asia/Shanghai')*/
name,
value value
FROM
system.settings
WHERE
name IN ('timezone')
UNION
ALL (
SELECT
/*+SET_VAR(timezone='America/Los_Angeles')*/
'x',
'x'
)
)
ORDER BY
name DESC
┌────────────────────────────────┐
│ name │ value │
│ String │ String │
├──────────┼─────────────────────┤
│ x │ x │
│ timezone │ America/Los_Angeles │
└────────────────────────────────┘
related: #16387
Hey @zhang2014! I'm here to assist you with any bugs, questions, or contributions while you wait for a human maintainer. Let's tackle this together!
To improve the settings hint to be more understandable under complex queries and ensure consistent behavior with user intuition, you can enhance the descriptions of the settings to be more detailed and user-friendly. Here is an example of how you can update the descriptions:
let default_settings = HashMap::from([
("enable_clickhouse_handler", DefaultSettingValue {
value: UserSettingValue::UInt64(0),
desc: "Enable or disable the ClickHouse handler. Set to 1 to enable, 0 to disable.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=1)),
}),
("max_block_size", DefaultSettingValue {
value: UserSettingValue::UInt64(65536),
desc: "Specify the maximum byte size of a single data block that can be read. This helps in managing memory usage during query execution.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(1..=u64::MAX)),
}),
("parquet_max_block_size", DefaultSettingValue {
value: UserSettingValue::UInt64(8192),
desc: "Define the maximum block size for the Parquet reader. Adjust this to optimize performance based on your data size.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(1..=u64::MAX)),
}),
("max_threads", DefaultSettingValue {
value: UserSettingValue::UInt64(num_cpus),
desc: "Set the maximum number of threads to execute a request. This can help in parallel processing and improving query performance.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(1..=1024)),
}),
("max_memory_usage", DefaultSettingValue {
value: UserSettingValue::UInt64(max_memory_usage),
desc: "Specify the maximum memory usage in bytes for processing a single query. This setting helps in preventing out-of-memory errors.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=u64::MAX)),
}),
("data_retention_time_in_days", DefaultSettingValue {
value: UserSettingValue::UInt64(1),
desc: "Set the data retention time in days. This determines how long the data will be kept before being purged.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=data_retention_time_in_days_max)),
}),
("max_storage_io_requests", DefaultSettingValue {
value: UserSettingValue::UInt64(default_max_storage_io_requests),
desc: "Define the maximum number of concurrent I/O requests. This helps in managing the load on the storage system.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(1..=1024)),
}),
("storage_io_min_bytes_for_seek", DefaultSettingValue {
value: UserSettingValue::UInt64(48),
desc: "Set the minimum byte size of data that must be read from storage in a single I/O operation when seeking a new location in the data file.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=u64::MAX)),
}),
("storage_io_max_page_bytes_for_read", DefaultSettingValue {
value: UserSettingValue::UInt64(512 * 1024),
desc: "Specify the maximum byte size of data pages that can be read from storage in a single I/O operation. This helps in optimizing read performance.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=u64::MAX)),
}),
("flight_client_timeout", DefaultSettingValue {
value: UserSettingValue::UInt64(60),
desc: "Set the maximum time in seconds that a flight client request can be processed. This helps in preventing long-running requests.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=u64::MAX)),
}),
("http_handler_result_timeout_secs", DefaultSettingValue {
value: {
let result_timeout_secs = GlobalConfig::try_get_instance()
.map(|conf| conf.query.http_handler_result_timeout_secs)
.unwrap_or(60);
UserSettingValue::UInt64(result_timeout_secs)
},
desc: "Set the timeout in seconds that a HTTP query session expires without any polls. This helps in managing session lifetimes.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=u64::MAX)),
}),
("storage_read_buffer_size", DefaultSettingValue {
value: UserSettingValue::UInt64(1024 * 1024),
desc: "Specify the byte size of the buffer used for reading data into memory. This helps in optimizing read performance.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=u64::MAX)),
}),
("input_read_buffer_size", DefaultSettingValue {
value: UserSettingValue::UInt64(4 * 1024 * 1024),
desc: "Set the memory size in bytes allocated to the buffer used by the buffered reader to read data from storage. This helps in optimizing read performance.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=u64::MAX)),
}),
("enable_new_copy_for_text_formats", DefaultSettingValue {
value: UserSettingValue::UInt64(1),
desc: "Enable the new implementation for loading CSV files. Set to 1 to enable, 0 to disable.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=1)),
}),
("purge_duplicated_files_in_copy", DefaultSettingValue {
value: UserSettingValue::UInt64(0),
desc: "Enable or disable purging of duplicated files detected during execution of COPY INTO table. Set to 1 to enable, 0 to disable.",
mode: SettingMode::Both,
range: Some(SettingRange::Numeric(0..=1)),
}),
("timezone", DefaultSettingValue {
value: UserSettingValue::String("UTC".to_owned()),
desc: "Set the timezone for the session. This affects how date and time values are interpreted.",
mode: SettingMode::Both,
range: Some(SettingRange::String(all_timezones)),
}),
// Add more settings here with improved descriptions...
]);
By providing more detailed and user-friendly descriptions, users will have a better understanding of what each setting does and how to configure it appropriately, especially under complex queries. Additionally, the use of ScopeLevel and ChangeValue structures from the databend codebase can help in managing the scope and changes of settings more effectively [1][2].
To continue talking to Dosu, mention @dosu.
implemented in main branch