Karthikeyan comments

Results 62 comments of


                                            Karthikeyan

trafficstars

[BUG] double free or memory corruption when parsing some JSON

Created a fix for memory errors #15798. Long term fix should be rewriting the device tree creation algorithm

[BUG] double free or memory corruption when parsing some JSON

Discussed with @shrshi offline about the bug and need for refactor of `make_device_json_column()` in [json_column.cu#L477](https://github.com/rapidsai/cudf/blob/5819de376d091c619807070a43994bbecfb2cd6c/cpp/src/io/json/json_column.cu#L477) `make_device_json_column` function's complexity increased after addition of more features, and the current algorithm does not...

JSON tree algorithms refactor I: CSR data structure for column tree

/ok to test

JSON reader validation of values

/merge

[QST] Does the read_json() method support GPU acceleration?

This is readable with `orient="records", lines=False`. Following code works. ```python In [4]: df = cudf.read_json(StringIO(json_data), orient="records", lines=False, engine="cudf") In [5]: df Out[5]: id Col_01 Col_02 0 1 test 77 1...

[QST] Does the read_json() method support GPU acceleration?

_cudf.read_json_ uses `cudf` engine for JSON Lines only. it doesn’t use cudf engine automatically for other cases. https://github.com/rapidsai/cudf/blob/20ed009003944be776e28c26301354be287726f9/python/cudf/cudf/io/json.py#L60-L61 Right now, libcudf Nested JSON reader will support `orient="records"` and `orient="values"` with...

[BUG] Add support for `force_ascii=False` when writing to JSON with cuDF engine

This feature is easy to implement. It skips the UTF-8/UTF-16 encoding. We need add the options and skip `escape_strings_fn` call at `cudf/cpp/src/io/json/write_json.cu:548` It's a good first issue.

JSON tokenizer memory optimizations

Debugged further: This issue is unrelated to this PR. allocation goes to negative since in32_t > 2**31 is made negative. `json_in.size() = 2167967896` is more than 2 GiB (2.019 GiB)....

JSON tokenizer memory optimizations

> The changes look good to me. I am just concerned about the potential performance downside we get from repeatedly running the FSTs and redundantly computing some values, such as...

[FEA] Improve `GpuJsonToStructs` performance

After discussion with @ttnghia, Here are the improvements planned for different sections: - Section 1: @karthikeyann and @shrshi are working on validation, and memory usage reduction here. https://github.com/rapidsai/cudf/pull/16996 https://github.com/rapidsai/cudf/pull/16978 TBD:...