Vukasin Milovanovic

Results 139 comments of Vukasin Milovanovic

From some work in the JSON reader, it looks like we can remove `column_names`, as the same names can be accessed in `schema_info`. It should just be a matter of...

> Have a way to read the raw value without any parsing, so the resulting column will include the quotes In this case, is it okay to keep the quotes...

@revans2 the `keep_quotes` option is now merged. Can we close this issue? We can always reopen if the implementation is not sufficient.

@etseidl Thank you for the PR! Please include measured impact on libcudf benchmarks (Parquet reader and writer). IIRC, dictionary code is a bit touchy w.r.t. bit width. Please let me...

There's one case there the performance with the new PR tanks: `ParquetWrite/integral_file_output/29/1048575/1/0/0/manual_time 508 ms 506 ms 1 bytes_per_second=1.97023G/s encoded_file_size=720.342M peak_memory_usage=5.94825G` to `ParquetWrite/integral_file_output/29/1048575/1/0/0/manual_time 1752 ms 507 ms 2 bytes_per_second=584.397M/s encoded_file_size=702.663M peak_memory_usage=5.94825G`...

Great, now the writer perf looks good across the board! Are you sure this would not impact the reader benchmark results? Its input has changed - the dictionary width is...

Benchmark results look great. Both reading and writing is slightly faster, and the created files are slightly smaller :)