Fix ANSI mode failures in subquery_test.py [databricks]
Fixes #11029.
Some tests in subquery_test.py fail when run with ANSI mode enabled, because certain array columns are accessed with invalid indices. These tests predate the availability of ANSI mode in Spark.
This commit modifies the tests so that the generated data is appropriately sized for the query.
There is no loss of test coverage; failure cases for invalid index values in array columns are already tested as part of array_test::test_array_item_ansi_fail_invalid_index.
Edit: An additional (albeit minimal) test for ANSI on/off has been added to represent the removed cases.
Build
Build
Odd: The test runs locally, but fails in CI. Investigating...
Edit: Here's the complaint in the failure:
2024-07-01T22:49:45.4593611Z [2024-07-01T22:48:19.060Z] 2024-07-01 22:33:46 INFO Running test 'src/main/python/subquery_test.py::test_scalar_subquery_array[Null-True][DATAGEN_SEED=1719870311, TZ=UTC, IGNORE_ORDER({'local': True})]'
Not sure why the complaint is only for the new test, and not all the other tests that do the same. I have changed the test not to use a keyword for a table identifier.
Build
Build
Build
I'm investigating the test failures. It seems that the generated input isn't causing an exception in ANSI mode, on Databricks.
For the record, this doesn't have to do with Databricks, or most Spark versions. I've been able to repro the failure with:
DATAGEN_SEED=1720033671 PYSP_TEST_spark_sql_ansi_enabled=true SPARK_HOME=/home/mithunr/workspace/dev/spark/bin/spark-3.5.0-bin-hadoop3 TESTS=subquery_test.py::test_scalar_subquery_array_ansi_mode_failures ./integration_tests/run_pyspark_from_build.sh
This is an odd failure. I'm still investigating.
This has all been a bit silly. test_scalar_subquery_array_ansi_mode_failures isn't deterministic, it turns out. There are situations where the subquery might succeed without failures. Neither CPU nor GPU might fail in those cases.
I've modified things to make the exception fire deterministically.
Build
Thank you for reviewing, @jlowe. I've merged this change.