Hangfire.PostgreSql
Hangfire.PostgreSql copied to clipboard
Deadlock when using batches
Hangfire.Core: v1.7.8 Hangfire.PostgreSql: v1.6.3 Hangfire.Pro: v2.2.1 Dotnet Core: v2.2
Possible deadlock in the postgresql storage connector.
I don't see anything when using Trace logging for Hangfire.I think there is something wrong with the locking but don't know how to debug this. To test this i made a empty project with just hangfire and those 2 jobs.
The first enqueue is enqueued by BackgroundJob:
BackgroundJob.Enqueue(() => StartCache.Execute(null));
The StartCache job then inserts a batch job:
BatchJob.StartNew(x => {
foreach(var id in customerIds) {
x.Enqueue(() => CacheCustomer.Execute(id, null));
}
});
The Batchjob is inserted en the jobs are enqueued. It says they are fetched but nothing happens:

On the batches page it says that it's processing 1 job but nothing is happening:

Let me know if you require any information or have some pointers on how to find what the issue is here.
The lock table in the database shows the following (screenshot is taken at 7:45 UTC)

Is this still a problem here? Since you closed it on the Hangfire project itself?
It still is an issue. I closed it on the Hangfire project itself because the issue itself is not a "problem" with the Hangfire project. Currently Hangfire pro is only supported on SqlServer and Redis.pro.
I had some contact with Sergey and he told me that it's the way locking works in de Hangfire.Postgresql that is causing deadlocks. But it will not be an easy task to change this i think. On the other side you have the question how many people would use it, since afaict nobody is using Hangfire.postgresql with hangfire.pro batches (i could be very wrong here..).
Sergey also told me that he is considering contributing to postgresql/mysql but he currently doesn't know when so you don't want to give any promises.
How does it need to be changed according to Sergey then?
Quoting Sergey
Batches in Hangfire.Pro use all the methods available in JobStorageTransaction and JobStorageConnection classes, so all the available methods should be implemented. But there are a lot of non-functional requirements, which I attempted to document here – https://github.com/HangfireIO/Hangfire/issues/1296 2. So things are much more complicated than I thought initially. And even worse, there’s no documented wisdom on the topic of storage abstractions other than serializability, so this will require a lot of time unfortunately.
As far as I know, the worst moment in the Hangfire.Postgresql implementation is distributed locking. Unlike Hangfire.SqlServer that uses application locks (some kind of advisory locks in PostgreSQL), which are completely in-memory data structures and bound to connections, in the current PostgreSQL implementation locks are based on a dedicated table and fully materializable, which make them hard to be released on a process failure and prone to deadlocks.
Second mail
These years I’ve studied hard on this topic in order to get everything as much simple as possible. I know the answer now, but it will take a lot of time to implement everything without revolution, i.e. in a non-backward-compatible way. I’m even considering to contribute to Hangfire.PostgreSQL and Hangfire.MySQL, but currently don’t know when, so don’t want to give any promises.
I didn’t do this before, because there are so many nuances between different storages, between their transaction isolation semantics (despite they are all ACID-compliant), especially when applied to storage abstractions, that my mind was completely blown up.
Moving to locks like these: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS wouldn't be that hard I guess. AFAIK that will require PostgreSQL 9.0 and upwards..
But I don't know if that's a good move.
I'm don't know either, i don't know which versions people are running this on.
But besides that there were some earlier comments why advisory locks shouldn't be used here https://github.com/frankhommers/Hangfire.PostgreSql/issues/82
This is from a few years back i can't really tell if this is still relevant.
Yeah. The current locks are cross server. Advisory locks are not. But it could be an option if you're running just one hangfire server.
Hi, any news on this?
Is this still the case with 1.9.3
Yes and no, I will explaim Yes: the exception is still in there No: it's work
But becuase the exception we cannot be sure that the code is working so bottom line I think its not working
The exception is:
(Hangfire.PostgreSql.PostgreSqlDistributedLock) hangfire:batch:086beed2-3a13-4eff-8693-fd117f43de34:lock: Failed to lock with transaction Npgsql.PostgresException 40001: could not serialize access due to concurrent update at Npgsql.Internal.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|213_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage) at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken) at Npgsql.NpgsqlDataReader.NextResult() at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteNonQuery() at Dapper.SqlMapper.ExecuteCommand(IDbConnection cnn, CommandDefinition& command, Action
2 paramReader) in /_/Dapper/SqlMapper.cs:line 2858 at Dapper.SqlMapper.ExecuteImpl(IDbConnection cnn, CommandDefinition& command) in /_/Dapper/SqlMapper.cs:line 581 at Hangfire.PostgreSql.PostgreSqlDistributedLock.TransactionLockHandler.Lock(String resource, TimeSpan timeout, IDbConnection connection, PostgreSqlStorageOptions options) 2022-01-26 10:14:09 [WARN] (Hangfire.PostgreSql.PostgreSqlDistributedLock) hangfire:batch:086beed2-3a13-4eff-8693-fd117f43de34:lock: Failed to lock with transaction Npgsql.PostgresException 40001: could not serialize access due to concurrent update at Npgsql.Internal.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|213_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage) at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken) at Npgsql.NpgsqlDataReader.NextResult() at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken) at Npgsql.NpgsqlCommand.ExecuteNonQuery() at Dapper.SqlMapper.ExecuteCommand(IDbConnection cnn, CommandDefinition& command, Action2 paramReader) in //Dapper/SqlMapper.cs:line 2858 at Dapper.SqlMapper.ExecuteImpl(IDbConnection cnn, CommandDefinition& command) in //Dapper/SqlMapper.cs:line 581 at Hangfire.PostgreSql.PostgreSqlDistributedLock.TransactionLockHandler.Lock(String resource, TimeSpan timeout, IDbConnection connection, PostgreSqlStorageOptions options)