Lock remains on a record when application crashes during transaction
If the application crashes during a transaction using the latest sdk, the transaction lock remains on the record even after restarting the couchbase server. Therefore the updates after this lock will always throw a TransactionFailedException with CausingErrorClass = FailWriteWriteConflict message.
VERSIONS:
SDK versions:
Couchbase.Transactions 3.8.0
CouchbaseNetClient 3.8.0
Tested server version
Community Edition 8.0.0 build 3777 -> problem persits
Community Edition 7.6.2 build 3721 -> problem persits
The unittests are in the attached zip. Reproduce steps also in the attached zip/error_readme.md
I noticed you are using Couchbase.Transactions. We moved away from this, towards integrating the transactions into the sdk through the Cluster. I'll verify all this today, but there may be a bug in the Couchbase.Transactions code here, one that doesn't appear in the newer transactions code in Couchbase.
What should be happening is the crash leaves the changes to the doc staged, and then when any other transaction comes across this, it checks to see if the transaction could still be active (not timed out). If so, you get the WriteWriteConflict error. As soon as the transaction has expired (15 sec by default), the document can be written to in another transaction.
I'll check this out later today and see what's happening, and make sure my assumptions about this are correct. But at first glance this appears to be something worth (a) Insuring works properly in the new transaction code (tests for this pass, but just to be sure...), and (b) back porting a fix to Couchbase.Transactions as this is pretty serious.
@rszik - actually I'd forgotten that we do things differently in transactions now. When we discover a document we are trying to modify in a transaction was involved in another transaction which hasn't been completed/cleaned up, we no longer look at the expiration time (and state) of that transaction and proceed if it has expired. This is what we call a "lost" transaction, and there is a lost transaction loop which runs in the background, cleaning these up. So - in the old Couchbase.Transactions code, that lost transaction cleanup loop runs in the background as long as the transactions instance exists. So - if it is created, the transaction is run, then it is destroyed, that loop didn't have any time to cleanup, and so the transactional xattr remain, blocking that document from participating in another transaction each time.
First thing - use the integrated Transactions, which are available from the Cluster object. Then, as long as the Cluster is alive, the background tasks are running to deal with this situation. For all work, I'd suggest moving to this. It is pretty painless, as you do something like this now:
await cluster.Transactions.RunAsync(async ctx =>
{
var getResult = await ctx.GetAsync(collection, key);
await ctx.ReplaceAsync(getResult, new { One = "three" });
//Environment.Exit(0);
});
All our recent work has gone into this code and not the Couchbase.Transactions. Check out the documentation for all the details, but this is the way you want to run transactions now, they are integrated into the main sdk.
When you exit (simulating a crash rather than any sort of exception in the transactions lambda), that's how you get lost transactions, and they are cleaned up only by the background lost transactions cleanup threads. The TransactionsCleanupConfig in the TransactionsConfig (which now is part of the ClusterOptions) has a cleanupWindow parameter, which controls how long it takes for that loop to make a pass through all the transactions. You can set that to a pretty short window (at the expense of cpu cycles) to reap these lost transactions quicker. But - in practice this should be a fairly rare occurrence so probably the default (which is 1 minute) is probably reasonable.
Thanks @davidkelly , I'll check this solution and nofity you about the result