Hangfire.PostgreSql Could not place a lock on the resource: Lock timeout

Could not place a lock on the resource: Lock timeout

Open urossavelj opened this issue 5 years ago • 47 comments

It seems that you already fixed this issue in version 1.4.0, but we're getting a lot of errors like this every day, Not sure what is causing this, could it be connected with Npgsql package or maybe something else?

Hangfire.PostgreSql.PostgreSqlDistributedLockException: Could not place a lock on the resource 'HangFire:recurring-jobs:lock': Lock timeout.

at Hangfire.PostgreSql.PostgreSqlDistributedLock.PostgreSqlDistributedLock_Init_Transaction(String resource, TimeSpan timeout, IDbConnection connection, PostgreSqlStorageOptions options)

at Hangfire.PostgreSql.PostgreSqlDistributedLock..ctor(String resource, TimeSpan timeout, IDbConnection connection, PostgreSqlStorageOptions options)

at Hangfire.PostgreSql.PostgreSqlConnection.AcquireDistributedLock(String resource, TimeSpan timeout)

at Hangfire.Server.RecurringJobScheduler.UseConnectionDistributedLock(JobStorage storage, Func`2 action)

at Hangfire.Server.RecurringJobScheduler.EnqueueNextRecurringJobs(BackgroundProcessContext context)

at Hangfire.Server.RecurringJobScheduler.Execute(BackgroundProcessContext context)

at Hangfire.Server.BackgroundProcessDispatcherBuilder.ExecuteProcess(Guid executionId, Object state)

at Hangfire.Processing.BackgroundExecution.Run(Action`2 callback, Object state)

We have the latest version of the Hangfire.PostgreSql package: 1.6.0 and Npgsql.EntityFrameworkCore.PostgreSQL version 2.1.2.

Is there any known compatibility problem between those two?

If you need any more specific information I'll try to add it.

Aug 06 '19 14:08 urossavelj

Hello,

To avoid pushing the same recurring job more than once, a lock is created with a timeout. When another lock with the same resource name exists, the timeout expires and you receive this error. Please check the source code for more information: RecurringJobScheduler.cs

I think it is because the PostgreSqlDistributedLockException should extend from DistributedLockTimeoutException so that lock exceptions are handled.

@urossavelj Locks usually remain in the database when the application is not gracefully shutdown. If you are debugging locally, make sure to close the application from the CMD window if it is a console or the browser in case your server is in a web application.

Aug 10 '19 08:08 jhonnyelhelou91

So this is expected behaviour and nothing to worry about?

We're also having issue #111 ,will that be fixed soon?

Aug 12 '19 12:08 urossavelj

Encountering the same issue, please help

Jan 28 '20 10:01 a-beliaev-altatec

Emptying the lock table from hangfire database kinda "fixes" the issue.

Jan 30 '20 11:01 a-beliaev-altatec

The problem is that 'HangFire:recurring-jobs:lock' blocks the addition of jobs to the queue. Its not about any job its about RecurringJobScheduler itself. Here is an linqpad example, after couple of minutes i have this lock and jobs not queued anymore until default lock timeout of 10 minues is passed. hangfire test.zip

Looks like, in my case, lock is not released because of timeout, which I do not know what to do with

8 02:05:10.5794 | Hangfire.Server.RecurringJobScheduler | 35 | TRACE | Recurring job 'CreateSites' is being updated. RecurringJob: (Queue:default;Cron:* * * * *;TimeZoneId:UTC;Job:{"Type":"***","Method":"CreateSites","ParameterTypes":"[]","Arguments":"[]"};CreatedAt:2019-09-13T18:50:38.6914661Z;V:2;Running:yes;LastJobId:833806;LastExecution:2020-03-27T23:04:09.8281272Z;NextExecution:2020-03-27T23:05:00.0000000Z), Changes: (LastExecution:2020-03-27T23:05:10.4896455Z;NextExecution:2020-03-27T23:06:00.0000000Z;LastJobId:833810), NextExecution: (27.03.2020 23:06:00) 
2020-03-28 02:05:10.6947 | Hangfire.Server.RecurringJobScheduler | 35 | WARN | Recurring job 'CreateSites' can not be scheduled due to an exception. 
System.Transactions.TransactionAbortedException: The transaction has aborted. ---> System.TimeoutException: Transaction Timeout
   --- End of inner exception stack trace ---
   at System.Transactions.TransactionStateAborted.BeginCommit(InternalTransaction tx, Boolean asyncCommit, AsyncCallback asyncCallback, Object asyncState)
   at System.Transactions.CommittableTransaction.Commit()
   at System.Transactions.TransactionScope.InternalDispose()
   at System.Transactions.TransactionScope.Dispose()
   at Hangfire.PostgreSql.PostgreSqlWriteOnlyTransaction.Commit()
   at Hangfire.Server.RecurringJobScheduler.EnqueueBackgroundJob(BackgroundProcessContext context, IStorageConnection connection, String recurringJobId, DateTime now)
   at Hangfire.Server.RecurringJobScheduler.TryEnqueueBackgroundJob(BackgroundProcessContext context, IStorageConnection connection, String recurringJobId, DateTime now)

Mar 27 '20 22:03 xklonx

@davidroth Can you tell why TransactionScope usage was so hardcoded in PostgreSqlWriteOnlyTransaction, even if entity framework devs (as i understand it) advised to use native transactions? https://docs.microsoft.com/en-us/ef/ef6/saving/transactions It will throw exceptions if you issue any DDL and have not enabled distributed transactions through the MSDTC Service. We recommend using the approach outlined in the previous sections instead where possible. And its ok that default timeout used there TransactionSynchronisationTimeout so small, 500ms? There is no chance now to avoid usage of this when i dont want to use it? Main reason is i want to avoid MSDTC usage. May be its better, by default, when EnableTransactionScopeEnlistment is false to use old (1.6.3) native npgsql transaction logic?

Mar 27 '20 23:03 xklonx

@xklonx If EnableTransactionScopeEnlistment is set to false (which is the default for backwards compat reasons), the transaction scope is created using TransactionScopeOption.RequiresNew.

This means that a new transaction is used and therefore there is no enlisting in an existing ambient transaction (if any).

https://docs.microsoft.com/en-us/ef/ef6/saving/transactions It will throw exceptions if you issue any DDL and have not enabled distributed transactions through the MSDTC Service.

I dont know what the implementation of PostgreSqlWriteOnlyTransaction has to do with EF6?

And its ok that default timeout used there TransactionSynchronisationTimeout so small, 500ms? There is no chance now to avoid usage of this when i dont want to use it? Main reason is i want to avoid MSDTC usage.

The timeout is configurable via TransactionSynchronisationTimeout. The timeout was already there before the TransactionScope has been added. See https://github.com/frankhommers/Hangfire.PostgreSql/blob/master/src/Hangfire.PostgreSql/PostgreSqlStorageOptions.cs#L38

I did not come up with the 500ms.

Main reason is i want to avoid MSDTC usage.

Avoiding MSDTC is great. Yet i dont know what your issue is with the current implementation and how this affects your app. Please show me how the current implementation of PostgreSqlWriteOnlyTransaction triggers MSDTC escalation in your application.

May be its better, by default, when EnableTransactionScopeEnlistment is false to use old (1.6.3) native npgsql transaction logic?

IMO no because it would add additional code complexity without any benefit.

Mar 28 '20 09:03 davidroth

This means that a new System Transaction (not postgres transaction) is used even if i dont need it and it uses new timeout of 500ms and gives me timeout errors (https://github.com/frankhommers/Hangfire.PostgreSql/issues/119#issuecomment-605337122) when RecurringJobScheduler tries to update a job, because of that, lock HangFire:recurring-jobs:lock will remain and dont give a chance for RecurringJobScheduler to schedule new recurring jobs (example https://github.com/frankhommers/Hangfire.PostgreSql/issues/119#issuecomment-605337122). Before that change TransactionSynchronisationTimeout was used only in other class

if(executionTimer.Elapsed > _options.TransactionSynchronisationTimeout)
					throw new TimeoutException("SetRangeInHash experienced timeout while trying to execute transaction");

which was executed only after first try, nevermind how long it was. Now that new system transaction is always exists after using (var transaction = new TransactionScope(scopeOption, transactionOptions)), you can check it in linqpad by

var transactionOptions = new TransactionOptions()
	{
		IsolationLevel = System.Transactions.IsolationLevel.RepeatableRead,
		Timeout = TimeSpan.FromMilliseconds(500)
	};
	using (var transaction = new TransactionScope(TransactionScopeOption.RequiresNew, transactionOptions))
	{
		Transaction.Current.Dump();
	}

connection always enlisted by call _connection.EnlistTransaction(Transaction.Current); code of EnlistTransaction

public override void EnlistTransaction(Transaction transaction)
#nullable restore
        {
            if (EnlistedTransaction != null)
            {
                if (EnlistedTransaction.Equals(transaction))
                    return;
                try
                {
                    if (EnlistedTransaction.TransactionInformation.Status == System.Transactions.TransactionStatus.Active)
                        throw new InvalidOperationException($"Already enlisted to transaction (localid={EnlistedTransaction.TransactionInformation.LocalIdentifier})");
                }
                catch (ObjectDisposedException)
                {
                    // The MSDTC 2nd phase is asynchronous, so we may end up checking the TransactionInformation on
                    // a disposed transaction. To be extra safe we catch that, and understand that the transaction
                    // has ended - no problem for reenlisting.
                }
            }

            var connector = CheckReadyAndGetConnector();

            EnlistedTransaction = transaction;
            if (transaction == null)
                return;

            // Until #1378 is implemented, we have no recovery, and so no need to enlist as a durable resource manager
            // (or as promotable single phase).

            // Note that even when #1378 is implemented in some way, we should check for mono and go volatile in any case -
            // distributed transactions aren't supported.

            transaction.EnlistVolatile(new VolatileResourceManager(this, transaction), EnlistmentOptions.None);
            Log.Debug($"Enlisted volatile resource manager (localid={transaction.TransactionInformation.LocalIdentifier})", connector.Id);
        }

PostgreSqlWriteOnlyTransaction has nothin to do with ef but by default it must be based on npgsql transactions, like it was before, and now it not. You can check here https://stackoverflow.com/questions/1690892/transactionscope-automatically-escalating-to-msdtc-on-some-machines for example, when transactionscope escalates dtc usage. Simple way is just to move your database to another server. I dont understand why all the people who dont need transactionscope must deal with this. It can be real benefit for users to not deal with those timeout exceptions and dtc escalations. @frankhommers can you tell what you think about that?

Mar 28 '20 11:03 xklonx

@xklonx

This means that a new System Transaction (not postgres transaction) is used even if i dont need it and it uses new timeout of 500ms and gives me timeout errors (#119 (comment)) when RecurringJobScheduler tries to update a job, because of that, lock HangFire:recurring-jobs:lock will remain and dont give a chance for RecurringJobScheduler to schedule new recurring jobs (example #119 (comment)). Before that change TransactionSynchronisationTimeout was used only in other class

Well then we have a timeout issue. This has nothing todo with MSDTC. We could increase the timeout here so that it matches the previous timeout (which was the default npgslconnection timeout).

PostgreSqlWriteOnlyTransaction has nothin to do with ef but by default it must be based on npgsql transactions, like it was before, and now it not.

TransactionScope is only the facade. During enlistment a NpgsqlTransaction is used underneath.

You can check here https://stackoverflow.com/questions/1690892/transactionscope-automatically-escalating-to-msdtc-on-some-machines for example, when transactionscope escalates dtc usage.

I still dont understand why you are talking about MSDTC. The post behind the link is all about SqlConnection/SqlServer/Windows. You can hit MSDTC escalations when using Sql-Server on windows and spanning a transaction scope over multiple active connections. Since this issue is about the postgres adapter MSDTC is not a thing here. Sure, distributed transaction escalation exists also in this world, but here it is called "prepared transactions" (https://www.postgresql.org/docs/8.4/sql-prepare-transaction.html).

You still have not demonstrated that a prepared transaction escalation occurs with the current code.

It can be real benefit for users to not deal with those timeout exceptions and dtc escalations.

If the timeout is a problem we could increase/remove it. But this has nothing to do with msdtc.

Mar 28 '20 12:03 davidroth

Now i see that VolatileResourceManager uses _localTx = connection.BeginTransaction(ConvertIsolationLevel(_transaction.IsolationLevel)); and hangfire.sqlserver uses transactionscope too. As hangfire do all the stuff in separate threads it would be hard to intervene and escalate to distributed transaction. But i still dont understand decision to use TransactionScope. The timeout should definitely be increased to that used in hangfire.sqlserver TransactionTimeout = TimeSpan.FromMinutes(1);

Mar 28 '20 13:03 xklonx

Yep the timeout can be increased. That should help with your issue.

Mar 28 '20 14:03 davidroth

MR created: https://github.com/frankhommers/Hangfire.PostgreSql/pull/147

Mar 30 '20 15:03 davidroth

I've merged the PR. It's included in the 1.6.4.2 version. Does it help? Can we close the issue?

Apr 02 '20 13:04 vytautask

For me, yes, it would be interesting to know someone else it helped or not.

Apr 02 '20 14:04 xklonx

Please note that this issue was created long before TransactionScope support has been added. So Hangfire.PostgreSql.PostgreSqlDistributedLockException isnt something that was introduced with TransactionScope support. It was just more likely to hit the exception with the new code because of the 500ms timeout (until MR #147).

Apr 02 '20 14:04 davidroth

I am using 1.6.4.2 and i face the same issue, is there anyway to remove the lock? This is preventing me to queue new jobs.

Apr 08 '20 14:04 jayasurya-jeyakodi

You can remove it in lock table. Check in your logs due to which error the lock was not released, because it will surely be back again.

Apr 08 '20 23:04 xklonx

Hello! how remove all lock on startup? Because server can't start with lock. Hangfire.PostgreSql.PostgreSqlDistributedLockException: 'Could not place a lock on the resource 'HangFire:lock:recurring-job:Notifications': Lock timeout.'

Aug 10 '20 05:08 dpisarevskiy

It's above your question:

Hello! how remove all lock on startup? Because server can't start with lock. Hangfire.PostgreSql.PostgreSqlDistributedLockException: 'Could not place a lock on the resource 'HangFire🔒recurring-job:Notifications': Lock timeout.'

This:

You can remove it in lock table. Check in your logs due to which error the lock was not released, because it will surely be back again.

Nov 11 '20 16:11 frankhommers

It seems fixed now.

Nov 11 '20 16:11 frankhommers

We have latest versions (1.7.19) and Hangfire.PostgreSQL (1.8.1) still same issues. Anybody have an idea how we can fix this? It's odd, it just suddenly started for us on several independent systems

Feb 23 '21 14:02 jenswachtel

We have latest versions (1.7.19) and Hangfire.PostgreSQL (1.8.1) still same issues. Anybody have an idea how we can fix this? It's odd, it just suddenly started for us on several independent systems

Same, not fixed on our side.

Apr 19 '21 13:04 natalie-o-perret

Downgrade Hangfire to version (1.7.12) and Hangfire.PostgreSql to version (1.7.1). It works with dotnet 5!

May 11 '21 17:05 hadevnet

I am running Hangfire.Postgres v1.8.2 and Hangfire.Core v1.7.22 and this is an issue when using TransactionScope. @frankhommers would you mind re-opening this issue? This a pretty big problem if we cannot enqueue work into Hangfire and update application state within a transaction.

May 21 '21 20:05 JarrodJ83

@frankhommers I'm seeing this too, we should reopen this issue.

May 28 '21 19:05 alexrosenfeld10

Same issue here.

Jun 10 '21 13:06 cl-msmolcic

Same issue

Jun 29 '21 08:06 akshaybheda

I get this on the latest release:

Hangfire.PostgreSql.PostgreSqlDistributedLock - HangFire:recurring-jobs:lock: Failed to lock with transaction 40001: could not serialize access due to concurrent update

Jun 29 '21 15:06 alexrosenfeld10

Related to the recent changes / releases? @vytautask

Jun 29 '21 15:06 alexrosenfeld10

Related to the recent changes / releases? @vytautask

I do not think so. At least not my changes :)

Let's continue discussion in #191 as I have high hopes this could be solved via already proposed mechanism (only PR is missing 🙄 ).

Jul 01 '21 10:07 vytautask

Hangfire.PostgreSql Hangfire.PostgreSql copied to clipboard

Could not place a lock on the resource: Lock timeout

Hangfire.PostgreSql
Hangfire.PostgreSql copied to clipboard