Vukasin Milovanovic issues

Results 27 issues of


                                            Vukasin Milovanovic

Add fluent API builder to `data_profile`

## Description Added a builder to enable complex initialization of `data_profile` objects. The builder slightly expands the API to make some common uses easier: - Setting distribution no longer requires...

feature request

tests

libcudf

non-breaking

Add `create_random_column` function to the data generator

## Description Depends on #11479 Adds an API to create a single random column, so users don't need to create a table even when a single column is required. The...

feature request

libcudf

non-breaking

[BUG] cuIO benchmarks generate larger files than expected

cuIO ORC and Parquet benchmarks generate larger files than previously (e.g. GDS blog data). Some observations: - Only the cases where both cardinality and run length are set lead to...

bug

tests

cuIO

inactive-30d

[FEA] Make random data generator API fluent

Data profile is often set using multiple setters in a row. Having a fluent API makes such code less verbose. Similar approach is already used to create xyz_reader/writer_options.

feature request

tests

cuIO

[FEA] Implement skipinitialspace read_csv parameter

This parameter is already a part of the API, but it is not used internally. From experiments with Pandas, it looks like this parameter is only relevant when determining if...

feature request

libcudf

cuIO

Separate cuIO IO benchmarks from column type benchmarks

Currently cuIO benchmarks cover a cartesian product of IO options and column types, which leads to a large number of cases and long execution time. We can separate the IO...

feature request

cuIO

[FEA] Expand ORC and Parquet benchmarks to cover different stripe/rowgroup sizes

Add a set of benchmarks with varying stripe/rowgroup sizes to each affected component: - [ ] ORC reader - [ ] ORC writer - [ ] Parquet reader - [...

feature request

good first issue

tests

libcudf

cuIO

Performance

[FEA] Use KvikIO by default

Future GDS optimizations will be implemented in KvikIO, we should use it by default as soon as it matches the internal implementation in perf Tasks: - [ ] Compare the...

feature request

cuIO

Performance

improvement

[FEA] `read_orc_metadata` and `read_parquet_metadata` in libcudf

libcudf does not have functions to read metadata of a file without reading (a portion) of the data as well. Exposing an efficient way to get information like column names...

feature request

cuIO

Eliminate duplicate allocation of nested string columns

## Description Issue https://github.com/rapidsai/cudf/issues/14965 ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [ ] New or existing tests cover these changes. - [ ] The documentation...

bug

libcudf

cuIO

non-breaking