spark-rapids
spark-rapids copied to clipboard
[BUG] JsonToStructs and ScanJson do not normalize numeric output when read as a string
Describe the bug This is almost identical to https://github.com/NVIDIA/spark-rapids/issues/10218, but is for from_json and reading json lines formatted files.
Numbers like 1.00000 and -0 are not normalized to match what Apache Spark would do.
Another odd example of this is +INF and -INF. Even if allowNonNumericNumbers is disabled +INF and -INF are valid floats and are normalized to "Infinity" and "-Infinity" respectively. And the quotes come out in the string itself. This is also true for unquoted Infinity, -Infinity, and NaN
Technically in Spark 4.0 this was reverted (at least for scan by default)
https://issues.apache.org/jira/browse/SPARK-48148
https://github.com/apache/spark/pull/46408
This functionality was put under a config spark.sql.json.enableExactStringParsing with it on by default.
It appears to work for scan, but not for get_json_object. It also does not remove the white space any longer or normalize single quotes, which will make things a lot more interesting to try and make this work.