rewrite_table_path throws "AlreadyExistsException: Location already exists" on rewriting positional deletes

Open internetcoffeephone opened this issue 3 weeks ago • 0 comments

Apache Iceberg version

1.10.0 (latest release)

Query engine

Spark

Please describe the bug 🐞

I'm running Iceberg 1.10.0-amzn-0, the AWS-specific implementation for emr-7.12.0 with Spark 3.5.6. I suspect that this isn't an AWS-specific bug, but if my suspicion is wrong, feel free to close this issue.

For tables without positional deletes the procedure finishes successfully. For tables with positional deletes however, when I run the procedure rewrite_table_path like so:

CALL glue.system.rewrite_table_path(
table => 'glue.source_name.table_name',
source_prefix => 's3://bucket/iceberg/source_name/table_name',
target_prefix => 'target_prefix_test/iceberg/source_name/table_name',
staging_location => 's3://bucket/iceberg_rewrite_test/source_name/table_name'
)

I run into this error:

org.apache.iceberg.exceptions.AlreadyExistsException: Location already exists: s3://bucket/iceberg_rewrite_test/source_name/table_name/data/timestamp_month=2025-08/00000-190-b955afb8-8873-4d92-8546-c7be85bbccda-00002-deletes.parquet
	at org.apache.iceberg.aws.s3.S3OutputFile.create(S3OutputFile.java:107)
	at org.apache.iceberg.parquet.ParquetIO$ParquetOutputFile.create(ParquetIO.java:149)
	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:473)
	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:431)
	at org.apache.iceberg.parquet.ParquetWriter.ensureWriterInitialized(ParquetWriter.java:114)
	at org.apache.iceberg.parquet.ParquetWriter.flushRowGroup(ParquetWriter.java:214)
	at org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:258)
	at org.apache.iceberg.deletes.PositionDeleteWriter.close(PositionDeleteWriter.java:92)
	at org.apache.iceberg.RewriteTablePathUtil.rewritePositionDeleteFile(RewriteTablePathUtil.java:633)
	at org.apache.iceberg.spark.actions.RewriteTablePathSparkAction.lambda$rewritePositionDelete$a4760a1f$1(RewriteTablePathSparkAction.java:673)
	at org.apache.spark.sql.Dataset.$anonfun$foreach$2(Dataset.scala:3553)
	at org.apache.spark.sql.Dataset.$anonfun$foreach$2$adapted(Dataset.scala:3553)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1047)
	at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1047)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2545)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:174)
	at org.apache.spark.scheduler.Task.run(Task.scala:152)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:632)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:96)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:635)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

The specific file it fails on is seemingly random. Looks like a race condition where the same file is written in multiple tasks.

I will attempt to create a tiny table for which this problem occurs to make it reproducible.

Willingness to contribute

[ ] I can contribute a fix for this bug independently
[x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community
[ ] I cannot contribute a fix for this bug at this time

Dec 10 '25 16:12 internetcoffeephone