nvdbaranec
nvdbaranec
Inside of the `allocate_nesting_info` function, we allocate PageNestingInfo and PageNestingDecodeInfo structs and initialize them. However, the logic for traversing the schema in the file can sometimes leave the 0th element...
There is a small refactoring that can be done to de-duplicate some code in the parquet decoder which needs to be done as a followup. https://github.com/rapidsai/cudf/pull/11867#discussion_r1022137500
In the parquet reader there are two similar-sounding but distinct pieces of terminology: - Nested columns. This is the same as in the cudf sense. Anything involving structs or lists...
This PR implements a basket of optimizations for the parquet reader to bring non-chunked reads close to par following the merge of the sub-rowgroup reader. The primary culprit for the...
Addresses: https://github.com/rapidsai/cudf/issues/12700 Adds multithreaded benchmarks for the parquet reader. Separate benchmarks for the chunked and non-chunked readers. In both cases, the primary cases are 2, 4 and 8 threads running...
I have a benchmarking use case where it would be nice to be able to use a single thread pool across multiple benchmarks for ease of viewing in nsys. Imagine...
From @devavret , the question came up as to whether we guarantee the relative ordering of row groups across multiple input files in the parquet reader. That is, if you...
The benchmark was manually creating and using a pinned-pool rmm allocator which is now redundant, since cuIO itself does this by default. This PR removes it. ## Checklist - [x]...
Under some situations in the Parquet reader (particularly the case with tables containing many columns or deeply nested column) we burn a decent amount of time doing `cudaMemset()` operations on...
Dictionary support for this particular flavor of kernel was being compiled in. Harmless, but caused an unneeded increase in shared memory usage. This PR disables it. ## Checklist - [x]...