spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

Fix tests failures in parquet_write_test.py

Open razajafri opened this issue 1 year ago • 2 comments

FAILED ../../../../integration_tests/src/main/python/parquet_write_test.py::test_hive_timestamp_value
FAILED ../../../../integration_tests/src/main/python/parquet_write_test.py::test_non_empty_ctas
FAILED ../../../../integration_tests/src/main/python/parquet_write_test.py::test_parquet_write_fails_legacy_datetime
FAILED ../../../../integration_tests/src/main/python/parquet_write_test.py::test_parquet_write_roundtrip_datetime_with_legacy_rebase
FAILED ../../../../integration_tests/src/main/python/parquet_write_test.py::test_ts_write_fails_datetime_exception

razajafri avatar Jun 08 '24 05:06 razajafri

Some of these tests fail because of the use of a deprecated conf variable:

E               pyspark.errors.exceptions.captured.AnalysisException: The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' was removed in the version 4.0.0
. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.

The failing tests are:

  1. test_hive_timestamp_value
  2. test_parquet_write_roundtrip_datetime_with_legacy_rebase
  3. test_parquet_write_fails_legacy_datetime
  4. test_ts_write_fails_datetime_exception

These should be trivial to fix.

mythrocks avatar Jun 12 '24 22:06 mythrocks

test_non_empty_ctas fails because of creating a table pointing to a non-empty directory:

E               pyspark.errors.exceptions.captured.AnalysisException: CREATE-TABLE-AS-SELECT cannot create table with location to a non-empty directory file:///tmp/pyspark_tests/pop-os-main-3128167-2117921863/CTAS/ctas. To allow overwriting the existing non-empty directory, set 'spark.sql.legacy.allowNonEmptyLocationInCTAS' to true.

I'll need to look at the tests more closely.

mythrocks avatar Jun 12 '24 22:06 mythrocks

Unassigning myself. I don't think I'll get to this in the next couple of weeks.

mythrocks avatar Jul 24 '24 22:07 mythrocks

Turning ANSI mode off we have the following failures with reasons

test_hive_timestamp_value (The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' was removed in the version 4.0.0. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.)
test_non_empty_ctas (CREATE-TABLE-AS-SELECT cannot create table with location to a non-empty directory file:///tmp/pyspark_tests/a07cb15-lcedt-gw1-206777-597991485/CTAS/ctas. To allow overwriting the existing non-empty directory, set 'spark.sql.legacy.allowNonEmptyLocationInCTAS' to true.)
test_parquet_write_fails_legacy_datetime (The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' was removed in the version 4.0.0. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.)
test_parquet_write_roundtrip_datetime_with_legacy_rebase (The SQL config 'spark.sql.legacy.parquet.int96RebaseModeInWrite' was removed in the version 4.0.0. Use 'spark.sql.parquet.int96RebaseModeInWrite' instead.)
test_ts_write_fails_datetime_exception (The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' was removed in the version 4.0.0. Use 'spark.sql.parquet.datetimeRebaseModeInWrite' instead.)

razajafri avatar Aug 14 '24 22:08 razajafri