pxf icon indicating copy to clipboard operation
pxf copied to clipboard

Refactor JsonRecordReader and fix incorrect parsing of JSON objects for multi-line JSON

Open ashuka24 opened this issue 3 years ago • 0 comments

This PR is a continuation of ttps://github.com/greenplum-db/pxf/pull/858.

This commit refactors the JsonRecordReader to internally use the LineRecordReader when handling multi-line JSON files. This is done to avoid incorrectly parsing JSON objects, especially those that contain special characters.

It adds logic for a new table parameter "USE_PARALLEL_READ" which allows the users to toggle between the default HdfsDataFragmenter (true) and the HdfsFileFragmenter (false).

ashuka24 avatar Oct 11 '22 18:10 ashuka24