fdb-record-layer
fdb-record-layer copied to clipboard
Allow cursors to stop rather than throw an error on transaction timeouts or retriable errors
For some use cases, it would be useful to have an ExecuteState
option that would allow cursors to stop when they hit a transaction timeout (at least) or perhaps any retriable FDB exception rather than rethrowing the error. For example, in the case where one is doing an inconsistent read, it would make sense to do something where the transaction is allowed to just read until it hits the five second transaction time limit, then restart the transaction from there.
I would propose adding a new option to the ExecuteProperties
à la failOnScanLimitReached
. If that option is set, then the idea is that cursors should catch any transaction time outs (or perhaps any retriable errors in general) and return a RecordCursorResult
with a new NoNextReason
, something like TRANSACTION_TIMEOUT
or RETRIABLE_ERROR
, which indicates that the scan terminated early. (This NoNextReason
would be an out-of-band exception just like, say, the TIME_LIMIT_REACHED
exception.) The user could then resume execution from where the last cursor left off.
I think adding support for this new termination condition could be done by only editing a few different places, chiefly the KeyValueCursor
and maybe some of the other cursors that talk directly to FDB rather than going through the KeyValueCursor
. This is more-or-less the same set of places that might use the CursorScanLimiter
anyway, I think.
There are some possible draw backs to this, mainly that if there's some kind of problem where FDB is down or has an incompatible version, it can result in retriable errors, which may just mean that the user would want to limit the number of retries that are done on a RETRIABLE_ERROR
. However, plumbing through the error may be difficult with the current RecordCursorResult
API (as the NoNextReason
is just an enum), so the best we may be able to get is logging the error.
A few things came up during a discussion about this with some members of the team:
- There are probably a few other places that we'd need to consider adding error checking. For example, the
MapPipelinedCursor
could hit atransaction_too_old
error when resolving a record, so it's possible that any cursor that would ever do database work in any of its callbacks could hit this - It's possible that much the same thing could be achieved with an an "error translating cursor" that took a base cursor and had an
onNext
method that caught atransaction_too_old
(or whatever) and translated the cursor element into aRecordCursorResult
with the properNoNextReason
. This could also limit the surface area of this change by limiting the error catching to a single place. - Another place where this would be useful would be the
AutoContinuingCursor
, which we may want to let catch these errors and restart a cursor if it sees them. This could also be combined with an error catching cursor, with theAutoContinuingCursor
wrapping its produced cursors in the error-catching cursor