nvdbaranec issues

Results 24 issues of


                                            nvdbaranec

[BUG] For certain parquet list schemas, the root PageNestingInfo struct can end up uninitialized.

Inside of the `allocate_nesting_info` function, we allocate PageNestingInfo and PageNestingDecodeInfo structs and initialize them. However, the logic for traversing the schema in the file can sometimes leave the 0th element...

bug

Needs Triage

cuIO

[FEA] Follow up on refactoring possibility from parquet chunked reader PR

There is a small refactoring that can be done to de-duplicate some code in the parquet decoder which needs to be done as a followup. https://github.com/rapidsai/cudf/pull/11867#discussion_r1022137500

feature request

1 - On Deck

good first issue

libcudf

cuIO

improvement

[FEA] Parquet reader code cleanup, re: nested columns vs columns with lists.

In the parquet reader there are two similar-sounding but distinct pieces of terminology: - Nested columns. This is the same as in the cudf sense. Anything involving structs or lists...

0 - Backlog

proposal

code quality

libcudf

cuIO

Performance optimizations for parquet sub-rowgroup reader.

This PR implements a basket of optimizations for the parquet reader to bring non-chunked reads close to par following the merge of the sub-rowgroup reader. The primary culprit for the...

code quality

libcudf

cuIO

Performance

non-breaking

Add multithreaded parquet reader benchmarks.

Addresses: https://github.com/rapidsai/cudf/issues/12700 Adds multithreaded benchmarks for the parquet reader. Separate benchmarks for the chunked and non-chunked readers. In both cases, the primary cases are 2, 4 and 8 threads running...

libcudf

CMake

cuIO

Performance

helps: Spark

improvement

non-breaking

[FEA] Allow cudf::thread_pool to restrict the number of threads available.

I have a benchmarking use case where it would be nice to be able to use a single thread pool across multiple benchmarks for ease of viewing in nsys. Imagine...

feature request

[FEA] Explicitly guarantee row group ordering in the parquet reader.

From @devavret , the question came up as to whether we guarantee the relative ordering of row groups across multiple input files in the parquet reader. That is, if you...

feature request

libcudf

cuIO

improvement

Remove benchmark-specific use of pinned-pooled memory in Parquet multithreaded benchmark.

The benchmark was manually creating and using a pinned-pool rmm allocator which is now redundant, since cuIO itself does this by default. This PR removes it. ## Checklist - [x]...

libcudf

[FEA] Potential optimization: Batched memset.

Under some situations in the Parquet reader (particularly the case with tables containing many columns or deeply nested column) we burn a decent amount of time doing `cudaMemset()` operations on...

feature request

Performance

Disable dict support for split-page kernel in the parquet reader.

Dictionary support for this particular flavor of kernel was being compiled in. Harmless, but caused an unneeded increase in shared memory usage. This PR disables it. ## Checklist - [x]...

libcudf

cuIO

improvement

non-breaking