cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Parquet reader list microkernel

Open pmattione-nvidia opened this issue 1 year ago • 3 comments

This PR refactors fixed-width parquet list reader decoding into its own set of micro-kernels, templatizing the existing fixed-width microkernels. When skipping rows for lists, this will skip ahead the decoding of the definition, repetition, and dictionary rle_streams as well. The list kernel uses 128 threads per block and 71 registers per thread, so I've changed the launch_bounds to enforce a minimum of 8 blocks per SM. This causes a small register spill but the benchmarks are still faster, as seen below:

DEVICE_BUFFER list benchmarks (decompress + decode, not bound by IO): run_length 1, cardinality 0, no byte_limit: 24.7% faster run_length 32, cardinality 1000, no byte_limit: 18.3% faster run_length 1, cardinality 0, 500kb byte_limit: 57% faster run_length 32, cardinality 1000, 500kb byte_limit: 53% faster

Compressed list of ints on hard drive: 5.5% faster Sample real data on hard drive (many columns not lists): 0.5% faster

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.

pmattione-nvidia avatar Aug 12 '24 20:08 pmattione-nvidia

Seems like this is also adding list support to the split page path as well. Am I reading this right?

nvdbaranec avatar Oct 17 '24 22:10 nvdbaranec

One thing I've been thinking about is maybe splitting this file into two or three pieces.

  • One cu file containing the core loops for each of the major kernels (and the host side launch code)
  • A cuh file for the "update" functions
  • A cuh file for the "decode values" functions.

Definitely not for this PR, but something to think about down the road. I think it might help make the volume of code that has built up here more tractable.

nvdbaranec avatar Oct 17 '24 22:10 nvdbaranec

Seems like this is also adding list support to the split page path as well. Am I reading this right?

Yes.

pmattione-nvidia avatar Oct 18 '24 15:10 pmattione-nvidia

Please also run compute-sanitizer on the unit tests to make sure everything is good.

Tests pass.

pmattione-nvidia avatar Oct 28 '24 20:10 pmattione-nvidia

/merge

pmattione-nvidia avatar Oct 29 '24 21:10 pmattione-nvidia