nvdbaranec issues

Results 24 issues of


                                            nvdbaranec

Remove support for skip_rows / num_rows options in the parquet reader.

Removes support for skip_rows / num_rows options in the parquet reader. Users retail control of what gets read via row groups.

libcudf

cuDF (Python)

cuIO

improvement

breaking

[FEA] Resolve what to do with semi-redundant fields in cuIO table_metadata

This PR (https://github.com/rapidsai/cudf/pull/6318) adds a new field to the `table_metadata` struct, `schema_info`, which contains the column names for the entire hierarchy of returned columns, not just the root columns. This...

feature request

libcudf

cuDF (Python)

cuIO

inactive-30d

[FEA] Improve casting performance.

The casting code in the plugin is currently a collection of expensive string manipulation functionality (regex, etc). It exists to handle more exotic edge cases coming from CSV and JSON...

performance

[FEA] Performance issue with the Parquet reader for very large schemas (especially when containing strings)

For parquet files that contain very large schemas with strings (either large numbers of columns, or large numbers of nested columns) we pay a very heavy price postprocessing the string...

feature request

libcudf

cuIO

Performance

improvement

[FEA] The C++ tests for parquet don't test row group selection very well.

There's only a very basic row group selection test in the C++ gtests. It would probably be useful to have a more thorough set of tests.

feature request

0 - Backlog

tests

libcudf

cuIO

Proposal: Add general purpose host memory allocator reference to cuIO with a demo of pooled-pinned allocation.

This PR adds a new interface to cuIO which controls where host memory allocations come from. It adds two core functions: Addresses https://github.com/rapidsai/cudf/issues/14314 ``` void set_current_host_memory_resource(cudf::host_resource_ref mr); cudf::host_resource_ref get_current_host_memory_resource(); ```...

libcudf

cuIO

improvement

non-breaking

[FEA] Implement a templated parquet decoding kernel suitable for reuse in micro-kernel optimization approach.

As part of the drive towards implementing the micro-kernel parquet decoding strategy, we would like to start centralizing the core parquet decoding loop into a generic templated implementation that can...

feature request

Needs Triage

cuIO

Performance

tech debt

[BUG] Crash running parquet reader benchmarks.

The PARQUET_READER_NVBENCH crashes (segfault) at exit on some machines. It doesn't seem to happen consistently for everyone, but it tends to be reproducible once it starts happening. To reproduce, run...

bug

0 - Backlog

libcudf

cuIO

[BUG] Parquet column selection by name with schemas including list<struct<X, Y>> does not work.

If you have a schema that contains a list-of-struct, selecting a subset of the inner columns doesn't work. Example `list` If the schema for this column was ``` A (list)...

bug

0 - Backlog

libcudf

cuIO

[FEA] Parquet reader: replace skip_rows / num_rows with start_row / end_row

Our external interface to the parquet reader allows the user to specify `skip_rows` / `num_rows` parameters when calling it. Internally, we use the same values. But it is a very...

feature request

0 - Backlog

libcudf

cuIO