Shruti Shivakumar issues

Results 14 issues of


                                            Shruti Shivakumar

API for JSON unquoted whitespace normalization

## Description This work is a follow-up to PR #14931 which provided a proof-of-concept for using the a FST to normalize unquoted whitespaces. This PR implements the pre-processing FST in...

feature request

2 - In Progress

libcudf

CMake

cuDF (Java)

non-breaking

Optimizing multi-source byte range reading in JSON reader

## Description This piece of work seeks to achieve two goals - (i) reducing repeated reading of byte range chunks in the JSON reader, and (ii) enabling multi-source byte range...

libcudf

Performance

improvement

non-breaking

Reading multi-line JSON in string columns using runtime configurable delimiter

## Description Addresses #15277 Given a JSON lines buffer with records separated by a delimiter passed at runtime, the idea is to modify the JSON tokenization FST to consider the...

feature request

libcudf

cuIO

non-breaking

Fix multi-source reading in JSON byte range reader

## Description This PR fixes the number of bytes read and corrects the offsets for the delimiters added to the buffer when reading across multiple sources. ## Checklist - [X]...

bug

libcudf

cuIO

non-breaking

Reducing runtime of JSON reader options benchmark

## Description This PR cleans up the JSON reader options benchmark by reducing the number of runtime configurations from 162 to 20. Reasoning behind the splitting of the benchmark -...

libcudf

Performance

improvement

non-breaking

JSON tree algorithms refactor I: CSR data structure for column tree

## Description Part of #15903. 1. Introduces the Compressed Sparse Row (CSR) format to store the adjacency information of the column tree. 2. Analogous to `reduce_to_column_tree`, `reduce_to_column_tree_csr` reduces node tree...

libcudf

CMake

cuIO

improvement

non-breaking

[WIP] JSON host tree algorithms

## Description Coming soon. ## Checklist - [ ] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [ ] New or existing tests cover these changes. - [ ] The...

libcudf

CMake

cuIO

improvement

non-breaking

Fix bug in recovering invalid lines in JSONL inputs

## Description Addresses #16999 ## Checklist - [X] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md). - [X] New or existing tests cover these changes. - [ ] The documentation is...

bug

libcudf

cuIO

non-breaking

JSON tokenizer memory optimizations

## Description The full push-down automata that tokenizes the input JSON string, as well as the bracket-brace FST over-estimates the total buffer size required for the translated output and indices....

libcudf

cuIO

Performance

improvement

non-breaking

[BUG] Limit size of buffer read by batched multi-source JSON lines reader to be at most `INT_MAX` bytes

**Describe the bug** With the implementation of the reallocate-and-retry logic when the initial buffer size estimate fails for byte range reading (PR #16687), the total buffer size read per batch...

bug