spark-rapids
spark-rapids copied to clipboard
[BUG] Issues found by Spark UT Framework on RapidsJsonSuite
Describe the bug
Spark UT Framework enabled RapidsJsonSuite (https://github.com/NVIDIA/spark-rapids/pull/10743), with the following test cases explicitly excluded:
- [x] Casting long as timestamp: Not a bug after setting correct timezone
- [x] Write timestamps correctly with timestampFormat option and timeZone option: Not a bug after setting correct timezone
- [x] SPARK-23723: json in UTF-16 with BOM #10875
- [x] SPARK-23723: multi-line json in UTF-32BE with BOM
- [x] SPARK-23723: Use user's encoding in reading of multi-line json in UTF-16LE
- [x] SPARK-23723: Unsupported encoding name
- [x] SPARK-23723: checking that the encoding option is case agnostic
- [x] SPARK-23723: specified encoding is not matched to actual encoding
- [x] SPARK-23724: lineSep should be set if encoding if different from UTF-8
- [x] SPARK-31716: inferring should handle malformed input #10875
- [x] SPARK-24190: restrictions for JSONOptions in read #10875
- [x] exception mode for parsing date/timestamp string: Not a bug after setting correct timezone
- [x] #10902
GpuJsonScan issues after enabling spark.rapids.sql.format.json.read.enabled=true
by https://github.com/NVIDIA/spark-rapids/pull/11141
- [ ] SPARK-32810: JSON data source should be able to read files with escaped glob metacharacter in the paths
- [ ] SPARK-18352: Parse normal multi-line JSON files (uncompressed)
- [ ] SPARK-18352: Parse normal multi-line JSON files (compressed)
- [ ] Applying schemas
- [ ] Loading a JSON dataset from a text file with SQL
- [ ] Loading a JSON dataset from a text file
These excluded test cases needs further investigating!!! Notice: Other test cases in this suite may pass with falling back!
Steps/Code to reproduce bug
- Compile everything with
mvn -Dbuildver=330 install -DskipTests
- Pick a test case name in the above table
- Go to RapidsTestSettings and find the line starting with ".exclude" and containing the test case name, comment it out
- Run the Suite then you'll see one failed test case. E.g.
mvn -nsu -Dbuildver=330 -pl tests -Dsuites="org.apache.spark.sql.rapids.suites.RapidsXXXSuite" test
(replace RapidsXXXSuite with the right name in issue header). ALWAYS double check if your suite name coincide with in source code, as it may contain typos!
Expected behavior The suite can pass without excluding any test case.