ducklake icon indicating copy to clipboard operation
ducklake copied to clipboard

Concurrent writes can fail on first write to table

Open kyzyl opened this issue 6 months ago • 2 comments

I noticed a slightly obscure bug/behavior. As of 0.2, doing a concurrent write to a freshly created catalog fails with:

ERROR:  duplicate key value violates unique constraint "ducklake_data_file_pkey"
DETAIL:  Key (data_file_id)=(0) already exists.
CONTEXT:  COPY ducklake_data_file, line 1

Note that this is a different error than you get for normal write conflicts ("failed to serialize"). The error seems to be indicating that the unique constraint on the file id is being violated, implying that the conflicting transactions are making it through (even though they shouldn't) but are hitting a further primary key conflict with the one transaction that did make it through. If there are any rows whatsoever in ducklake_data_file when the concurrent write occurs, it will still fail but with the regular transaction conflict error, not the PK error.

This only occurs under the following conditions:

  1. Do concurrent writes to an empty catalog (i.e. the state ducklake creates after initial ATTACH)
  2. The writes must conflict and fail (e.g. ducklake_max_retry_count is set to 0)

So it's not a very serious bug, given that you can only see it once, when the catalog is empty, if the first write is concurrent, and that concurrent write fails with a conflict. So even if you had a conflicting, concurrent write as your first write, you still wouldn't see it unless you disabled retrying or the transactions conflicted for some other reason.

Anyhow, maybe this is somehow expected behavior? I'm not sure how it works under the hood, but it seems a bit odd.

kyzyl avatar Jun 23 '25 04:06 kyzyl

Thanks for the report!

This seems like expected behavior to me. Data file ids are sequentially assigned, so concurrent writers can encounter primary key constraints there. The retry will then re-assign file identifiers with subsequent file ids.

Mytherin avatar Jun 23 '25 12:06 Mytherin

Yeah the mechanism makes sense. I guess the question is why do we only get a PK conflict on data_file_id when doing the first write to ducklake_data_file? Most concurrent writes will be attempting to use one of those sequentially assigned IDs, so why does the first batch of concurrent writes fail with the PK error, meanwhile all other conflicts from then on fail with a serialization error? Naively I would expect that if I do two identical concurrent writes, one with zero rows in ducklake_data_file and one with one row, the conflicts would fail with the same errors both times.

It just smelled like an edge case to me, but maybe not. Automated concurrency tests like mine are likely the only ones to see this anyhow.

kyzyl avatar Jun 23 '25 22:06 kyzyl