Vukasin Milovanovic comments

Results 139 comments of


                                            Vukasin Milovanovic

Adds the end-to-end JSON parser implementation

@gpucibot merge

[BUG] Loads a Hive Map columns as list when read with read_orc.

cuDF does not have a map type, so such columns are read as a list of structs of key-value pairs. AFAICT the behavior here is expected.

[BUG] Loads a Hive Map columns as list when read with read_orc.

That's how the structs are formatted; '0' and '1' are auto generated field names in the struct column. CC @galipremsagar in case I'm missing something related to formatting.

[QST] Should byte_array_view in parquet reader/writer change

I got some ideas, but they depend on a few points I'm not sure about yet: 1. Are comparison semantics of `byte_array_view` similar/equivalent to `string_view`? 2. Is `byte_array_view` a `device_span`?...

[QST] Should byte_array_view in parquet reader/writer change

Aiming to avoid code duplication: `ordered_device_span`: public device_span` + comparison impl `string_view` : derived from `ordered_device_span` + min/max impl `byte_array_view` : derived from `ordered_device_span` + min/max impl ~Probably requires CRTP...

[QST] Should byte_array_view in parquet reader/writer change

Didn't know about libcu++ potential involvement. In that case, my vote is to publicly derive from `device_span` to remove duplicated data access members (FWIW). Much less exciting solution :D

Separate cuIO IO benchmarks from column type benchmarks

Note: any PRs that change benchmarks are encouraged/required to migrate the benchmarks to NVBench.

[FEA] Expand ORC and Parquet benchmarks to cover different stripe/rowgroup sizes

Note: any PRs that change benchmarks are encouraged/required to migrate the benchmarks to NVBench.

[FEA] [JSON reader] to support column prune

> This is a blocker for Spark to be able to use the JSON reader. Because we do not know all of the columns, the user just gives the ones...

[FEA] [JSON reader] to support column prune

I'm asking as a short term solution, because column pruning would need to be reworked when we add nested type support. Does this feature request include pruning of nested columns...