Karthikeyan

Results 62 comments of Karthikeyan
trafficstars

Created a fix for memory errors #15798. Long term fix should be rewriting the device tree creation algorithm

Discussed with @shrshi offline about the bug and need for refactor of `make_device_json_column()` in [json_column.cu#L477](https://github.com/rapidsai/cudf/blob/5819de376d091c619807070a43994bbecfb2cd6c/cpp/src/io/json/json_column.cu#L477) `make_device_json_column` function's complexity increased after addition of more features, and the current algorithm does not...

This is readable with `orient="records", lines=False`. Following code works. ```python In [4]: df = cudf.read_json(StringIO(json_data), orient="records", lines=False, engine="cudf") In [5]: df Out[5]: id Col_01 Col_02 0 1 test 77 1...

_cudf.read_json_ uses `cudf` engine for JSON Lines only. it doesn’t use cudf engine automatically for other cases. https://github.com/rapidsai/cudf/blob/20ed009003944be776e28c26301354be287726f9/python/cudf/cudf/io/json.py#L60-L61 Right now, libcudf Nested JSON reader will support `orient="records"` and `orient="values"` with...

This feature is easy to implement. It skips the UTF-8/UTF-16 encoding. We need add the options and skip `escape_strings_fn` call at `cudf/cpp/src/io/json/write_json.cu:548` It's a good first issue.

Debugged further: This issue is unrelated to this PR. allocation goes to negative since in32_t > 2**31 is made negative. `json_in.size() = 2167967896` is more than 2 GiB (2.019 GiB)....

> The changes look good to me. I am just concerned about the potential performance downside we get from repeatedly running the FSTs and redundantly computing some values, such as...

After discussion with @ttnghia, Here are the improvements planned for different sections: - Section 1: @karthikeyann and @shrshi are working on validation, and memory usage reduction here. https://github.com/rapidsai/cudf/pull/16996 https://github.com/rapidsai/cudf/pull/16978 TBD:...