Vukasin Milovanovic
Vukasin Milovanovic
## Description Added a builder to enable complex initialization of `data_profile` objects. The builder slightly expands the API to make some common uses easier: - Setting distribution no longer requires...
## Description Depends on #11479 Adds an API to create a single random column, so users don't need to create a table even when a single column is required. The...
cuIO ORC and Parquet benchmarks generate larger files than previously (e.g. GDS blog data). Some observations: - Only the cases where both cardinality and run length are set lead to...
Data profile is often set using multiple setters in a row. Having a fluent API makes such code less verbose. Similar approach is already used to create xyz_reader/writer_options.
This parameter is already a part of the API, but it is not used internally. From experiments with Pandas, it looks like this parameter is only relevant when determining if...
Currently cuIO benchmarks cover a cartesian product of IO options and column types, which leads to a large number of cases and long execution time. We can separate the IO...
Add a set of benchmarks with varying stripe/rowgroup sizes to each affected component: - [ ] ORC reader - [ ] ORC writer - [ ] Parquet reader - [...
Future GDS optimizations will be implemented in KvikIO, we should use it by default as soon as it matches the internal implementation in perf Tasks: - [ ] Compare the...
libcudf does not have functions to read metadata of a file without reading (a portion) of the data as well. Exposing an efficient way to get information like column names...
## Description Issue https://github.com/rapidsai/cudf/issues/14965 ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [ ] New or existing tests cover these changes. - [ ] The documentation...