nodejs-firestore icon indicating copy to clipboard operation
nodejs-firestore copied to clipboard

`runTransaction()`: `maxAttempts` not honored when `> MAX_RETRY_ATTEMPTS`

Open maganap opened this issue 10 months ago • 2 comments

Issue

A call like:

await db.runTransaction(f, { maxAttempts: DB_TRANSACTION_MAX_ATTEMPTS })

is not honoring maxAttempts when maxAttempts > MAX_RETRY_ATTEMPTS (from ExponentialBackoff class). So, setting maxAttempts: 20 retries only 10 times (which is the current defined value of MAX_RETRY_ATTEMPTS).

What is happening is that maybeBackoff(): https://github.com/googleapis/nodejs-firestore/blob/8118d37bf27b49d5e695bf18a9550a37ecc34ca2/dev/src/transaction.ts#L617

is throwing with just an error message when retries hit MAX_RETRY_ATTEMPTS: https://github.com/googleapis/nodejs-firestore/blob/8118d37bf27b49d5e695bf18a9550a37ecc34ca2/dev/src/backoff.ts#L264

so isRetryableTransactionError(err) is NOT getting a GoogleError error, thus returning false and breaking the loop before maxAttempts is reached: https://github.com/googleapis/nodejs-firestore/blob/8118d37bf27b49d5e695bf18a9550a37ecc34ca2/dev/src/transaction.ts#L623

Environment details

(Using on Cloud Run Function, gen2)

  • OS: Ubuntu 22.04
  • Node.js version: 20
  • npm version: 10
  • @google-cloud/firestore version: 7.10.0

Sorry I can't provide a MRE. It just happens when our Firestore database is under heavy load, it's not something I can reproduce at will.

maganap avatar Feb 25 '25 11:02 maganap

Thank you for the detailed description!

At first glance, this does appear to be a bug. Effectively, transactions are capped to 10 retry attempts. This limit does not exist in the Java SDK.

The limit was originally imposed as part of https://github.com/firebase/firebase-admin-node/issues/478. However, as a side effect, it also applies to transactions.

Before transaction options were introduced, transactions were hardcoded to maximum 5 retries, so the limit of 10 had no impact. Transaction options did not take into consideration that a secondary retry limit also existed.

Due to the backoff being shared by different components, a fix will require disentangling this a little bit. If we remove this limit from backoff, then we have to make sure components impose their own max.

I will add this to our backlog of bugs to fix. Since this is not a trivial fix, and this being a fairly benign bug, we will wait to fix this until someone on our team has capacity to fix this.

I will bring this to the attention of the rest of the Firestore SDK team. Expect to see updates here when we take on this work.

tom-andersen avatar Feb 26 '25 16:02 tom-andersen

We would very much appreciate this fix as well!

oconnorjoseph avatar Jun 09 '25 21:06 oconnorjoseph