[BUG] JsonToStructs and ScanJson should return null for non-numeric, non-boolean non-quoted strings
Describe the bug The way that from_json and the json scan work is that they will try to parse a number/boolean first and then if that works out the result is returned as a string. This is also related to validation. If a single column is an invalid unquoted string, then then entire row needs to be invalidated.
In this case we are looking at unquoted values. In JSON for boolean values only true and false are allowed. They are case sensitive so TRUE and FALSE are not valid. Numbers have to look like the desired number type or they are not valid. 1.0 is not a valid int like with https://github.com/NVIDIA/spark-rapids/issues/10469. Note that 1,000 is invalid in all cases for numbers, unless it is in a quoted string and is being read as a decimal value https://github.com/NVIDIA/spark-rapids/issues/10470.
Things get to be a little complicated because this is different for GetJsonObject or JsonTuple where everything that is valid is returned as a string. Note that I said is valid. TRUE is not a valid unquoted value, and it too would result in the entire line for GetJsonObject or JsonTuple being returned as null.
I think to make this work we are either going to need some help from CUDF to have better validation. Or we are going to need complicates post processing by enabling CUDF to return quoted strings. I think the latter is going to give us the most flexibility, and then we can come back to CUDF and figure out how to make it work more effeciently.
This is mostly fixed, but if we try to read the data as a string, then it is not validated, it is just returned as a string.
I see this as a subset of https://github.com/rapidsai/cudf/issues/15222
We can probably still fix it in our code, for non-nested data but it means we will have to run a regular expression over all of the returned string output, and ultimately we really should have CUDF do the validation everywhere if we want it to be right.
This should be addressed as a part of CUDF validation.