foundationdb icon indicating copy to clipboard operation
foundationdb copied to clipboard

Feature Status: Idempotency IDs

Open PierreZ opened this issue 3 months ago • 7 comments

We heavily rely on Idempotency IDs (automatic-idempotency.html) to handle automatic transaction retries introduced in FDB 7.3.0+ (see this analysis). However, this feature is not listed in feature-status.md.

I found two related issues (#11446 on idempotency ID implementation and #10312 on automatic cleanup) but the overall feature status remains unclear.

The documentation mentions important caveats: it only prevents commit_unknown_result (not transaction_timed_out or cluster_version_changed), and it's not recommended for Multi-version client users. The Multi-version client limitation is particularly concerning as it's commonly used in production environments. Given these limitations, could you clarify:

Is this feature production-ready, and in which FDB versions? What's the Long Term status? Are there other issues or PRs related to this feature that I should be aware of?

I'm happy to help with any remaining work if needed, though I'd need some guidance to get started as this would be my first Flow/C++ oriented PR.

PierreZ avatar Sep 02 '25 12:09 PierreZ

I believe this is still an experimental feature. @atn34 probably can comment on the work left?

jzhou77 avatar Sep 02 '25 17:09 jzhou77

Looking back over my notes, I think there are two issues which aren't filed as issues:

  • The test for the automatic idempotency cleaner could fail because an idempotency id was automatically cleaned up, but this was misinterpreted as "cleaned too much"
  • A transaction that succeeded through the automatic idempotency codepath would not get a commit version in the client.

There are also the caveats which I believe are already documented around cluster_version_changed and transaction_timed_out, both of which could be fixed but would take significant effort.

atn34 avatar Sep 02 '25 17:09 atn34

Thanks @atn34!

Could you create the issues with some context in it? In the mean time, I will setup my dev environment.

PierreZ avatar Sep 03 '25 07:09 PierreZ

The test for the automatic idempotency cleaner could fail because an idempotency id was automatically cleaned up, but this was misinterpreted as "cleaned too much"

@atn34 Did PR 10552 fix the problem you are referring to? We do not see flakey tests with AutomaticIdempotency.toml.

dlambrig avatar Dec 02 '25 20:12 dlambrig

No. The problem I'm referring to in more detail:

  • The test asserts that "the cleaner hasn't cleaned too much", using the oldest extant idempotency id as a proxy for how far the cleaner has cleaned
  • There are two codepaths which can delete an idempotency id
    • The cleaner
    • The client after it receives the commit acknowledgement when automatic idempotency is enabled
  • The test fails if deleting an idempotency id through the "automatic idempotency" codepath triggers the "cleaner cleaned too far" assertion

The fix is that we want to treat the oldest extant idempotency id that didn't have automatic idempotency enabled as our proxy for how far the cleaner has cleaned. We need to update some metadata stored in kv pairs in that test to track that

This test failure is indeed very rare

atn34 avatar Dec 02 '25 21:12 atn34

The test for the automatic idempotency cleaner could fail because an idempotency id was automatically cleaned up, but this was misinterpreted as "cleaned too much"

issue 12581

dlambrig avatar Dec 03 '25 01:12 dlambrig

A transaction that succeeded through the automatic idempotency codepath would not get a commit version in the client.

issue 12582

Could you create the issues with some context in it? In the mean time, I will setup my dev environment.

fyi @PierreZ

dlambrig avatar Dec 03 '25 01:12 dlambrig