nvdbaranec

Results 24 issues of nvdbaranec

Inside of the `allocate_nesting_info` function, we allocate PageNestingInfo and PageNestingDecodeInfo structs and initialize them. However, the logic for traversing the schema in the file can sometimes leave the 0th element...

bug
Needs Triage
cuIO

There is a small refactoring that can be done to de-duplicate some code in the parquet decoder which needs to be done as a followup. https://github.com/rapidsai/cudf/pull/11867#discussion_r1022137500

feature request
1 - On Deck
good first issue
libcudf
cuIO
improvement

In the parquet reader there are two similar-sounding but distinct pieces of terminology: - Nested columns. This is the same as in the cudf sense. Anything involving structs or lists...

0 - Backlog
proposal
code quality
libcudf
cuIO

This PR implements a basket of optimizations for the parquet reader to bring non-chunked reads close to par following the merge of the sub-rowgroup reader. The primary culprit for the...

code quality
libcudf
cuIO
Performance
non-breaking

Addresses: https://github.com/rapidsai/cudf/issues/12700 Adds multithreaded benchmarks for the parquet reader. Separate benchmarks for the chunked and non-chunked readers. In both cases, the primary cases are 2, 4 and 8 threads running...

libcudf
CMake
cuIO
Performance
helps: Spark
improvement
non-breaking

I have a benchmarking use case where it would be nice to be able to use a single thread pool across multiple benchmarks for ease of viewing in nsys. Imagine...

feature request

From @devavret , the question came up as to whether we guarantee the relative ordering of row groups across multiple input files in the parquet reader. That is, if you...

feature request
libcudf
cuIO
improvement

The benchmark was manually creating and using a pinned-pool rmm allocator which is now redundant, since cuIO itself does this by default. This PR removes it. ## Checklist - [x]...

libcudf

Under some situations in the Parquet reader (particularly the case with tables containing many columns or deeply nested column) we burn a decent amount of time doing `cudaMemset()` operations on...

feature request
Performance

Dictionary support for this particular flavor of kernel was being compiled in. Harmless, but caused an unneeded increase in shared memory usage. This PR disables it. ## Checklist - [x]...

libcudf
cuIO
improvement
non-breaking