datafusion-comet Spark SQL test failures in native_iceberg

Describe the bug

This issue is to track Spark SQL test failures in native_iceberg_compat mode.

core1

2025-03-17T18:45:32.4547575Z [info] - SPARK-44185: relative LOCATION with Append SaveMode shall check the qualified path *** FAILED *** (290 milliseconds)
2025-03-17T18:46:18.2321668Z [info] - add a nested column at the end of the leaf struct column *** FAILED *** (205 milliseconds)
2025-03-17T18:46:18.4251766Z [info] - add a nested column in the middle of the leaf struct column *** FAILED *** (186 milliseconds)
2025-03-17T18:46:18.6388827Z [info] - add a nested column at the end of the middle struct column *** FAILED *** (213 milliseconds)
2025-03-17T18:46:18.8659932Z [info] - add a nested column in the middle of the middle struct column *** FAILED *** (224 milliseconds)
2025-03-17T18:46:19.1230175Z [info] - hide a nested column at the end of the leaf struct column *** FAILED *** (251 milliseconds)
2025-03-17T18:46:19.3753704Z [info] - hide a nested column in the middle of the leaf struct column *** FAILED *** (250 milliseconds)
2025-03-17T18:46:19.6467631Z [info] - hide a nested column at the end of the middle struct column *** FAILED *** (269 milliseconds)
2025-03-17T18:46:19.8920982Z [info] - hide a nested column in the middle of the middle struct column *** FAILED *** (236 milliseconds)
2025-03-17T18:49:42.7623123Z [info] - SPARK-44334: Status of execution w/ error and w/o jobs shall be FAILED not COMPLETED (0 milliseconds)
2025-03-17T18:54:31.4491764Z [info] - SPARK-36182: can't read TimestampLTZ as TimestampNTZ *** FAILED *** (92 milliseconds)
2025-03-17T18:54:31.6017852Z [info] - SPARK-36182: read TimestampNTZ as TimestampLTZ *** FAILED *** (150 milliseconds)
2025-03-17T18:54:35.0992821Z [info] - SPARK-10005 Schema merging for nested struct *** FAILED *** (194 milliseconds)
2025-03-17T18:54:35.7665789Z [info] - SPARK-10301 requested schema clipping - requested schema contains physical schema *** FAILED *** (97 milliseconds)
2025-03-17T18:54:36.0964621Z [info] - SPARK-10301 requested schema clipping - schemas overlap but don't contain each other *** FAILED *** (107 milliseconds)
2025-03-17T18:54:36.3870161Z [info] - SPARK-10301 requested schema clipping - out of order *** FAILED *** (149 milliseconds)
2025-03-17T18:55:17.8419088Z [info] *** 15 TESTS FAILED ***

core3

2025-03-17T18:55:35.0503526Z [info] - row index generation - vectorized reader, filtered, small pages, small row groups, small splits *** FAILED *** (3 seconds, 397 milliseconds)
2025-03-17T18:55:35.2843564Z [info] - row index generation - vectorized reader, filtered, small pages, small row groups *** FAILED *** (230 milliseconds)
2025-03-17T18:55:48.2463130Z [info] - row index generation - vectorized reader, filtered, small row groups, small splits *** FAILED *** (4 seconds, 579 milliseconds)
2025-03-17T18:55:48.4971115Z [info] - row index generation - vectorized reader, filtered, small row groups *** FAILED *** (245 milliseconds)
2025-03-17T18:57:00.5751334Z [info] - row index generation - parquet-mr reader, filtered, small pages, small row groups, small splits *** FAILED *** (3 seconds, 657 milliseconds)
2025-03-17T18:57:00.8519000Z [info] - row index generation - parquet-mr reader, filtered, small pages, small row groups *** FAILED *** (271 milliseconds)
2025-03-17T18:57:13.4412427Z [info] - row index generation - parquet-mr reader, filtered, small row groups, small splits *** FAILED *** (4 seconds, 54 milliseconds)
2025-03-17T18:57:13.6855645Z [info] - row index generation - parquet-mr reader, filtered, small row groups *** FAILED *** (238 milliseconds)
2025-03-17T18:57:26.6695011Z [info] - invalid row index column type - vectorized reader *** FAILED *** (118 milliseconds)
2025-03-17T19:10:02.6603053Z [info] *** 9 TESTS FAILED ***

hive1

2025-03-17T18:42:54.3447649Z [info] - SPARK-25206: wrong records are returned by filter pushdown when Hive metastore schema and parquet schema are in different letter cases *** FAILED *** (4 seconds, 123 milliseconds)
2025-03-17T18:55:16.6694268Z [info] - SPARK-34990: Write and read an encrypted parquet *** FAILED *** (273 milliseconds)
2025-03-17T18:55:16.8861010Z [info] - SPARK-37117: Can't read files in Parquet encryption external key material mode *** FAILED *** (213 milliseconds)
2025-03-17T18:55:17.0853363Z [info] - SPARK-42114: Test of uniform parquet encryption *** FAILED *** (192 milliseconds)
2025-03-17T19:01:52.8977652Z [info] - SPARK-5775 read struct from partitioned_parquet_with_key_and_complextypes *** FAILED *** (72 milliseconds)
2025-03-17T19:01:53.0397371Z [info] - SPARK-5775 read struct from partitioned_parquet_with_complextypes *** FAILED *** (40 milliseconds)
2025-03-17T19:16:45.8867978Z [info] - test all data types *** FAILED *** (32 seconds, 185 milliseconds)
2025-03-17T19:17:54.4172632Z [info] - SPARK-5775 read struct from partitioned_parquet_with_key_and_complextypes *** FAILED *** (151 milliseconds)
2025-03-17T19:17:54.7035979Z [info] - SPARK-5775 read struct from partitioned_parquet_with_complextypes *** FAILED *** (103 milliseconds)
2025-03-17T19:18:34.9716557Z [info] *** 9 TESTS FAILED ***

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Mar 17 '25 19:03 andygrove

@parthchandra @mbutrovich fyi

Mar 17 '25 19:03 andygrove

I'm looking into the core3 row index generation errors. At least one of them is failing with NPE in Comet code:

Caused by: java.lang.NullPointerException
	at org.apache.comet.parquet.NativeBatchReader.nextBatch(NativeBatchReader.java:427)
	at org.apache.comet.parquet.NativeBatchReader.nextKeyValue(NativeBatchReader.java:373)

Relevant code:

    for (int i = 0; i < columnReaders.length; i++) {
      AbstractColumnReader reader = columnReaders[i];
      long startNs = System.nanoTime();
      // TODO: read from native reader
      reader.readBatch(batchSize);

columnReaders[i] appears to be null, presumably for the _metadata.row_index column.

Perhaps this code is missing a check for missingColumns[i]?

Mar 21 '25 17:03 andygrove

@parthchandra I made some notes above on the core3 failures. Do you know how we should handle missing columns in this case?

Mar 21 '25 17:03 andygrove

Let me look into this.

Mar 21 '25 23:03 parthchandra

catalyst: Passed: Total 6925, Failed 0, Errors 0, Passed 6925, Ignored 5, Canceled 1
core 1: Failed: Total 8686, Failed 47, Errors 0, Passed 8639, Ignored 277, Canceled 3
core 2: Failed: Total 2045, Failed 106, Errors 0, Passed 1939, Ignored 360
core 3: Failed: Total 1394, Failed 24, Errors 0, Passed 1370, Ignored 119, Canceled 15
hive 1: Failed: Total 2144, Failed 9, Errors 0, Passed 2135, Ignored 38, Canceled 4
hive 2: Error: Total 19, Failed 0, Errors 1, Passed 18, Ignored 1, Canceled 4
hive 3: Passed: Total 1044, Failed 0, Errors 0, Passed 1044, Ignored 13, Canceled 4

Counts from https://github.com/apache/datafusion-comet/pull/1541 today. Will track here as we keep updating.

Apr 01 '25 16:04 mbutrovich

Failure count :

Core 1: Tests: succeeded 9113, failed 25, canceled 6, ignored 292, pending 0
Core 2: Tests: succeeded 2636, failed 19, canceled 0, ignored 387, pending 0
Hive 1: Tests: succeeded 2144, failed  4, canceled 4, ignored  40, pending 0

Total failures: 48 (down from 176)

May 01 '25 17:05 parthchandra

Failure count:

core-1: Failed: Total 9216, Failed 17, Errors 0, Passed 9199, Ignored 214, Canceled 6
core-2: Failed: Total 2655, Failed 19, Errors 0, Passed 2636, Ignored 387
hive-1: Failed: Total 2174, Failed 3, Errors 0, Passed 2171, Ignored 14, Canceled 4

Total failures: 39 (down from 48)

core-1

- SPARK-36182: can't read TimestampLTZ as TimestampNTZ *** FAILED *** (103 milliseconds)
- SPARK-26677: negated null-safe equality comparison should not filter matched row groups *** FAILED *** (175 milliseconds)
- SPARK-34212 Parquet should read decimals correctly *** FAILED *** (398 milliseconds)
- row group skipping doesn't overflow when reading into larger type *** FAILED *** (94 milliseconds)
- schema mismatch failure error message for parquet vectorized reader *** FAILED *** (291 milliseconds)
- SPARK-45604: schema mismatch failure error on timestamp_ntz to array<timestamp_ntz> *** FAILED *** (223 milliseconds)
- Spark native readers should respect spark.sql.caseSensitive - parquet *** FAILED *** (440 milliseconds)
- SPARK-31116: Select nested schema with case insensitive mode *** FAILED *** (695 milliseconds)
- vectorized reader: missing all struct fields *** FAILED *** (178 milliseconds)
- SPARK-35640: read binary as timestamp should throw schema incompatible error *** FAILED *** (106 milliseconds)
- SPARK-35640: int as long should throw schema incompatible error *** FAILED *** (95 milliseconds)
- Parquet reads infer fields using field ids correctly *** FAILED *** (141 milliseconds)
- absence of field ids *** FAILED *** (155 milliseconds)
- SPARK-38094: absence of field ids: reading nested schema *** FAILED *** (176 milliseconds)
- SPARK-39557 INSERT INTO statements with tables with array defaults *** FAILED *** (164 milliseconds)
- SPARK-39557 INSERT INTO statements with tables with struct defaults *** FAILED *** (162 milliseconds)
- SPARK-39557 INSERT INTO statements with tables with map defaults *** FAILED *** (154 milliseconds)

core-2

- Filters should be pushed down for vectorized Parquet reader at row group level *** FAILED *** (297 milliseconds)
- filter pushdown - StringPredicate *** FAILED *** (299 milliseconds)
- SPARK-34562: Bloom filter push down *** FAILED *** (345 milliseconds)
- Spark vectorized reader - without partition data column - select a single complex field from a map entry and in clause *** FAILED *** (290 milliseconds)
- Spark vectorized reader - with partition data column - select a single complex field from a map entry and in clause *** FAILED *** (269 milliseconds)
- Non-vectorized reader - without partition data column - select a single complex field from a map entry and in clause *** FAILED *** (263 milliseconds)
- Non-vectorized reader - with partition data column - select a single complex field from a map entry and in clause *** FAILED *** (322 milliseconds)
- Spark vectorized reader - without partition data column - select nested field from a complex map key using map_keys *** FAILED *** (315 milliseconds)
- Spark vectorized reader - with partition data column - select nested field from a complex map key using map_keys *** FAILED *** (278 milliseconds)
- Non-vectorized reader - without partition data column - select nested field from a complex map key using map_keys *** FAILED *** (261 milliseconds)
- Non-vectorized reader - with partition data column - select nested field from a complex map key using map_keys *** FAILED *** (312 milliseconds)
- Spark vectorized reader - without partition data column - select nested field from a complex map value using map_values *** FAILED *** (268 milliseconds)
- Spark vectorized reader - with partition data column - select nested field from a complex map value using map_values *** FAILED *** (262 milliseconds)
- Non-vectorized reader - without partition data column - select nested field from a complex map value using map_values *** FAILED *** (264 milliseconds)
- Non-vectorized reader - with partition data column - select nested field from a complex map value using map_values *** FAILED *** (308 milliseconds)
- Spark vectorized reader - without partition data column - SPARK-40033: Schema pruning support through element_at *** FAILED *** (554 milliseconds)
- Spark vectorized reader - with partition data column - SPARK-40033: Schema pruning support through element_at *** FAILED *** (510 milliseconds)
- Non-vectorized reader - without partition data column - SPARK-40033: Schema pruning support through element_at *** FAILED *** (525 milliseconds)
- Non-vectorized reader - with partition data column - SPARK-40033: Schema pruning support through element_at *** FAILED *** (488 milliseconds)

hive-1

- SPARK-34990: Write and read an encrypted parquet *** FAILED *** (318 milliseconds)
- SPARK-37117: Can't read files in Parquet encryption external key material mode *** FAILED *** (211 milliseconds)
- SPARK-42114: Test of uniform parquet encryption *** FAILED *** (195 milliseconds)

May 21 '25 13:05 andygrove