Hangfire icon indicating copy to clipboard operation
Hangfire copied to clipboard

Azure cloud database connection resiliency

Open trancefreak77 opened this issue 11 months ago • 1 comments

We are using Hangfire in a Linux docker container running in Azure. In Azure we experience quite often database connection issues. Example Exception: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 35 - An internal exception was caught) at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

We get these exception when accessing the database with our code and also when Hangfire tries to access the db. After some research I came accross the following GitHub issues: https://github.com/dotnet/SqlClient/issues/2103#issuecomment-1764206103 https://github.com/dotnet/SqlClient/issues/1773

There seems to be a bug in the Microsoft Linux version of the SQLClient which does not hand over the correct error number when creating the SQLException. The error code is always 0 in these resiliency cases. The Windows version of the SQLClient does not have this bug and therefore EFCore can handle these resiliency issues and does a reconnect / retry the database operation. On Linux it can't because of the missing error number.

We were able to implement a custom execution strategy to handle these resiliency problems as shown here: https://github.com/dotnet/SqlClient/issues/2103#issuecomment-1798070309

Have you ever been made aware of these resiliency issues under Linux or could you implement something similar in your code base? The issue we currently face is that our code now does no longer create these exceptions but when Hangfire wants to access it's database it fails with the error message mentioned above.

So my question is if you could implement such a custom resiliency strategy in your code?

Thanks, Christian

trancefreak77 avatar Dec 06 '24 12:12 trancefreak77

Hangfire uses retries by default in background, however for client methods retries aren't enabled by default. You can register the IBackgroundJobClient service with retries enabled in the following way in a modern .NET application:

services.AddSingleton<IBackgroundJobClient>(
    provider => new BackgroundJobClient(provider.GetService<JobStorage>())
    {
        RetryAttempts = 3
    });

It will make retry attempts on any exception occurred, and will check whether a particular job already exists first (that's useful on timeout exceptions).

odinserj avatar Dec 09 '24 09:12 odinserj