cudf icon indicating copy to clipboard operation
cudf copied to clipboard

JSON reader validation of values

Open karthikeyann opened this issue 1 year ago • 1 comments
trafficstars

Description

Addresses part of https://github.com/rapidsai/cudf/issues/15222 This change adds validation stage in JSON reader at tokens level. If any validation fails in a row, it will make the entire row as null.

  • [x] validation functor - implement spark validation rules. (@revans2 implemented all validation rules)
  • [x] move output iterator to thrust. (already merged by https://github.com/NVIDIA/cccl/pull/2282)
  • [x] Fix failing tests and infer data type for Float.

Checklist

  • [x] I am familiar with the Contributing Guidelines.
  • [x] New or existing tests cover these changes.
  • [x] The documentation is up to date with these changes.

karthikeyann avatar Jun 11 '24 08:06 karthikeyann

I am seeing two test failures around NBSP in a quoted string. I need to do some more debugging to see if it is my code changes or yours that are causing the problem.

revans2 avatar Aug 26 '24 22:08 revans2

Pushed my first review round. Will come back later. Thanks for working on this :)

ttnghia avatar Sep 04 '24 05:09 ttnghia

/merge

karthikeyann avatar Sep 11 '24 14:09 karthikeyann