ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

Parquet: read columns in parallel

Open al13n321 opened this issue 1 year ago • 1 comments

Currently ParquetBlockInputFormat parallelizes reading at row group granularity. I.e. it reads+decodes multiple row groups in parallel, but each row group reading+decoding is single-threaded. It would probably be better to be able to read different columns from the same row group in parallel.

Benefits:

  • Less memory usage because we don't have to keep num_threads * row_group_size of read data in memory. This is pretty important because row groups are typically hundreds of MBs, and we want to read+decode using tens of threads => potentially tens of GB of memory usage.
  • Faster reads if the file has few row groups - because we can have more threads than row groups.
  • Faster reads with small LIMIT - because we'll finish the first row group faster.

Implementation requirements:

  • Small column chunks (column chunk = a column in a row group) should be grouped together to avoid short reads. Even whole row groups should be grouped together if they're small - this is already implemented, and needs to keep working.
  • Row group may be too big to be decoded as one Block, even if its compressed data is small (we've seen extreme compression ratios in practice). So, each column chunk needs to be split into pieces, then corresponding pieces collected from all columns into a Block. Column chunk reader should be careful to not run too far ahead and queue up too many pieces.

al13n321 avatar May 09 '24 23:05 al13n321

Looking forward to it.

zhanglistar avatar May 13 '24 03:05 zhanglistar