iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

rewritedatafile: Cannot commit, found new position delete for replaced data

Open chenwyi2 opened this issue 2 years ago • 3 comments

i use spark to run rewritedatafile, and i have a flink job write this table every 5 min, but if the rewritefatafile spend more than 5 min, it will commit error with "ValidationException: Cannot commit, found new position delete for replaced data file: GenericDataFile{content=data, file_path=qbfs://online01/warehouse/prod_censor_datalake.db/document/dataxxx", so there is a good solution? so now i must keep rewritedatafile in 5 min, otherwise it will commit failed

chenwyi2 avatar Jun 08 '22 13:06 chenwyi2

and i already use using use-starting-sequence-number variable, iceberg version is 0.13.1

chenwyi2 avatar Jun 08 '22 13:06 chenwyi2

I am seeing the same issue. I am ok If there is new position-delete found and then we cannot commit here, but could we have a strategy to auto-retry?

jingli430 avatar Jun 22 '22 05:06 jingli430

we use cdc mode, the data is always changing, even if we use strategy to auto-retry, rewritedatafile will always be failed

chenwyi2 avatar Sep 22 '22 08:09 chenwyi2

My iceberg version is 1.0.0, but the same problem occurs.

humengyu2012 avatar Nov 29 '22 20:11 humengyu2012

Is there any recent progress on this issue?

humengyu2012 avatar Feb 14 '23 10:02 humengyu2012

Same here, it is a big issue because we can not ingest and optimize at the same time. Is there any idea or workaround to solve it?

anistal avatar Mar 23 '23 20:03 anistal