nvdbaranec

Results 24 issues of nvdbaranec

Removes support for skip_rows / num_rows options in the parquet reader. Users retail control of what gets read via row groups.

libcudf
cuDF (Python)
cuIO
improvement
breaking

This PR (https://github.com/rapidsai/cudf/pull/6318) adds a new field to the `table_metadata` struct, `schema_info`, which contains the column names for the entire hierarchy of returned columns, not just the root columns. This...

feature request
libcudf
cuDF (Python)
cuIO
inactive-30d

The casting code in the plugin is currently a collection of expensive string manipulation functionality (regex, etc). It exists to handle more exotic edge cases coming from CSV and JSON...

performance

For parquet files that contain very large schemas with strings (either large numbers of columns, or large numbers of nested columns) we pay a very heavy price postprocessing the string...

feature request
libcudf
cuIO
Performance
improvement

There's only a very basic row group selection test in the C++ gtests. It would probably be useful to have a more thorough set of tests.

feature request
0 - Backlog
tests
libcudf
cuIO

This PR adds a new interface to cuIO which controls where host memory allocations come from. It adds two core functions: Addresses https://github.com/rapidsai/cudf/issues/14314 ``` void set_current_host_memory_resource(cudf::host_resource_ref mr); cudf::host_resource_ref get_current_host_memory_resource(); ```...

libcudf
cuIO
improvement
non-breaking

As part of the drive towards implementing the micro-kernel parquet decoding strategy, we would like to start centralizing the core parquet decoding loop into a generic templated implementation that can...

feature request
Needs Triage
cuIO
Performance
tech debt

The PARQUET_READER_NVBENCH crashes (segfault) at exit on some machines. It doesn't seem to happen consistently for everyone, but it tends to be reproducible once it starts happening. To reproduce, run...

bug
0 - Backlog
libcudf
cuIO

If you have a schema that contains a list-of-struct, selecting a subset of the inner columns doesn't work. Example `list` If the schema for this column was ``` A (list)...

bug
0 - Backlog
libcudf
cuIO

Our external interface to the parquet reader allows the user to specify `skip_rows` / `num_rows` parameters when calling it. Internally, we use the same values. But it is a very...

feature request
0 - Backlog
libcudf
cuIO