delta
                                
                                 delta copied to clipboard
                                
                                    delta copied to clipboard
                            
                            
                            
                        [BUG][Spark][Presto] Delta lake integration with presto #1994
Bug
Hi, I'm experiencing the same issue when reading a Delta table from Presto. The file path reported is a checkpoint file (.checkpoint.parquet), but the stack trace suggests it's not recognized as a valid Parquet file.
The error happens only when reading from Presto, but it works perfectly fine when reading from Apache Spark.
Here's the complete exception stack trace for reference:
java.lang.RuntimeException: s3a://xxxxx-/delta/xxxxx/xxxxxxx/_delta_log/00000000000000000180.checkpoint.parquet is not a Parquet file. Expected magic number at tail, but found [21, 0, 21, 14]
at shadedelta.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:556)
at shadedelta.org.apache.parquet.hadoop.ParquetFileReader.
This makes me think that either:
The checkpoint file was written with some version-specific format that Presto Delta plugin can't recognize, or
The file is corrupted (but Spark can read it without issues).
Let me know if there's any fix or workaround, or if you'd like me to upload the checkpoint file somewhere for analysis.
Thanks!
Environment information
- Delta Lake version:2.2.0
- Spark version: 3.3.1
- Scala version: 2.12