WIP: Spark 4: Fix miscellaneous tests including logic, repart, hive_delimited.
(Partially) fixes #11031.
This PR addresses tests that fail on Spark 4.0 in the following files:
-
integration_tests/src/main/python/datasourcev2_read_test.py -
integration_tests/src/main/python/expand_exec_test.py -
integration_tests/src/main/python/get_json_test.py -
integration_tests/src/main/python/hive_delimited_text_test.py -
integration_tests/src/main/python/logic_test.py -
integration_tests/src/main/python/repart_test.py -
integration_tests/src/main/python/time_window_test.py
Still a work in progress. A couple of other tests to be addressed.
Build
Build
That last failure was an interesting one to track down.
Time interval calculations on Spark < 3.3 involve multiplication/division aggregation operations. These tend to fall off the GPU in ANSI mode because of #5114. This test is guaranteed to fail, because part of the plan is off the GPU.
For Spark >= 3.3, the same calculations seem to involve modulo operations that don't seem susceptible to ANSI-mode failures.
I've included a skip for this test with ANSI enabled, on Spark < 3.3. This can be rolled back once #5114 is addressed.
Build
Build
@NVnavkumar, I was wondering if you might take another look at this one.
Build
There seems to be an error on Spark 3.3, where the expected exception isn't thrown. It's taking a bit of time to repro. I'll update here once I have something.
I think I've addressed the Databricks failure. I'll kick off another build, and request the reviewers for another round.
Build
@NVnavkumar, I've fixed the last nit. Does this look agreeable?
Thank you for reviewing, @NVnavkumar. This change has now been merged.