Feature: add copy option `MAX_FILE_WRITE_THREADS`
Summary
after https://github.com/datafuselabs/databend/pull/15596, the file size control for parquet is improved. but when there are many threads, blocks are likely to eventually be distributed to the writer threads, and result in relative small files.
a grouping processor is used to group small blocks to MAX_FILE_SIZE before distributed to the writer threads. but its based on uncompressed size, so may result in files with size MAX_FILE_SIZE/compress_ratio
user can change the setting max_threads, but this will affect the whole plan.
compress ratio estimator
another automated approach is to enhance the grouping processor with a compress ratio estimator,
- compress ratio may be diff from block to block
- grouping larger mem of blocks cost more tmp memory
We can use the /*+ SET_VAR(max_threads=1) */ to only set the copy, no need an new option?
max_threads=1 will slow down the whole query, including the source and computing.