delta icon indicating copy to clipboard operation
delta copied to clipboard

[BUG][Spark][Presto] Delta lake integration with presto #1994

Open qijinkui opened this issue 5 months ago • 1 comments

Bug

Hi, I'm experiencing the same issue when reading a Delta table from Presto. The file path reported is a checkpoint file (.checkpoint.parquet), but the stack trace suggests it's not recognized as a valid Parquet file.

The error happens only when reading from Presto, but it works perfectly fine when reading from Apache Spark.

Here's the complete exception stack trace for reference:

java.lang.RuntimeException: s3a://xxxxx-/delta/xxxxx/xxxxxxx/_delta_log/00000000000000000180.checkpoint.parquet is not a Parquet file. Expected magic number at tail, but found [21, 0, 21, 14] at shadedelta.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:556) at shadedelta.org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:776) at shadedelta.org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657) at shadedelta.org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152) at shadedelta.org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135) at shadedelta.com.github.mjakubowski84.parquet4s.ParquetIterableImpl$$anon$3.hasNext(ParquetReader.scala:144) at io.delta.standalone.internal.actions.CustomParquetIterator.hasNext(MemoryOptimizedLogReplay.scala:132) at io.delta.standalone.internal.actions.MemoryOptimizedLogReplay$$anon$1.$anonfun$ensureNextIterIsReady$3(MemoryOptimizedLogReplay.scala:81) ... at com.facebook.presto.delta.DeltaClient.loadDeltaTableLog(DeltaClient.java:151) at com.facebook.presto.delta.DeltaClient.getTable(DeltaClient.java:79) at com.facebook.presto.delta.DeltaMetadata.getTableHandle(DeltaMetadata.java:220) ... at com.facebook.presto.execution.SqlQueryExecution.(SqlQueryExecution.java:207) at java.lang.Thread.run(Thread.java:750) If I remove the .checkpoint.parquet file, the query works fine again in Presto.

This makes me think that either:

The checkpoint file was written with some version-specific format that Presto Delta plugin can't recognize, or

The file is corrupted (but Spark can read it without issues).

Let me know if there's any fix or workaround, or if you'd like me to upload the checkpoint file somewhere for analysis.

Thanks!

Image

Image

Image

Environment information

  • Delta Lake version:2.2.0
  • Spark version: 3.3.1
  • Scala version: 2.12

qijinkui avatar May 16 '25 15:05 qijinkui