nvdbaranec
nvdbaranec
Removes support for skip_rows / num_rows options in the parquet reader. Users retail control of what gets read via row groups.
This PR (https://github.com/rapidsai/cudf/pull/6318) adds a new field to the `table_metadata` struct, `schema_info`, which contains the column names for the entire hierarchy of returned columns, not just the root columns. This...
The casting code in the plugin is currently a collection of expensive string manipulation functionality (regex, etc). It exists to handle more exotic edge cases coming from CSV and JSON...
For parquet files that contain very large schemas with strings (either large numbers of columns, or large numbers of nested columns) we pay a very heavy price postprocessing the string...
There's only a very basic row group selection test in the C++ gtests. It would probably be useful to have a more thorough set of tests.
This PR adds a new interface to cuIO which controls where host memory allocations come from. It adds two core functions: Addresses https://github.com/rapidsai/cudf/issues/14314 ``` void set_current_host_memory_resource(cudf::host_resource_ref mr); cudf::host_resource_ref get_current_host_memory_resource(); ```...
As part of the drive towards implementing the micro-kernel parquet decoding strategy, we would like to start centralizing the core parquet decoding loop into a generic templated implementation that can...
The PARQUET_READER_NVBENCH crashes (segfault) at exit on some machines. It doesn't seem to happen consistently for everyone, but it tends to be reproducible once it starts happening. To reproduce, run...
If you have a schema that contains a list-of-struct, selecting a subset of the inner columns doesn't work. Example `list` If the schema for this column was ``` A (list)...
Our external interface to the parquet reader allows the user to specify `skip_rows` / `num_rows` parameters when calling it. Internally, we use the same values. But it is a very...