Hangfire
Hangfire copied to clipboard
AsyncLocal<T> retains values on every job run
I implemented AsyncLocal<T> in a particular service method to avoid some repeated processing, however the same service method ends up being called by Hangfire and on every job run the data of AsyncLocal<T>.Value, which should be null on each asynchronous call retains the same value until the app is retarted. The code works perfectly when coming from a SignalR or WebAPI request, it's only Hangfire processes that retain the value.
Code for job setup:
RecurringJobManager manager = new RecurringJobManager();
if (jobEnable)
{
manager.AddOrUpdate("JobName", Job.FromExpression(() => TaskSchedulingController.DoJob()), "* * * * *", options);
}
else
{
manager.RemoveIfExists("JobName");
}
Code example for job, which will only ever run successfully once, then will error on every subsequent call:
private static AsyncLocal<string> _test = new AsyncLocal<string>();
public async Task DoJob()
{
if (_test.Value != null)
{
throw new Exception("AsyncLocal is retaining a value");
}
_test.Value = "ABC123";
}
While I agree that AsyncLocal instances shouldn't be preserved across different background job runs, and ExecutionContext should be somehow flushed, but this is potentially a breaking change for those who use this feature. So I'm postponing this to 2.0 version without any specific point in time.
Is it possible to clean these values manually, for example in a try/finally block?
Yeah, that's actually how I worked around the issue for the time being.
3rd parties use AsyncLocal as well. This is a pretty important feature. For example, majority of tracing libraries that aren't through OpenTelemetry depend on it. We use Datadog to trace hangfire jobs and Datadog tracks the active span using AsyncLocal. Removes possibility of clearing/maintaining it manually.
@odinserj Do we have an ETA on the 2.0 release? We have issues with libraries using AsyncLocal. Thanks!
Actually I think something like the following one can be implemented in the nearest patch release under some flag, and then the new flag can be switched by default in a non-patch release:
https://source.dot.net/#Microsoft.AspNetCore.Server.Kestrel.Core/Internal/Http/HttpProtocol.cs,635
Hi @DanPatten, working on this issue now. Can you give me some extended code example to conduct an end-to-end test and ensure that the resulting solution will fix the case?
UPD. Possibly with including Hangfire-related configuration logic and library calls involved.
@odinserj We figured out the cause, it appears jobs that enqueue other jobs then AsyncLocal is carried over. This happens for background or recurring jobs. After this happens all future calls to this job will have the old AsyncLocal context. Firing off another job should have it's own async local context.
public class TestJob
{
private static AsyncLocal<string> _test = new AsyncLocal<string>();
public async Task Execute()
{
const int jobCount = 1000;
for (int i = 0; i < jobCount; i++)
{
BackgroundJob.Enqueue(() => TestSecondJob());
}
}
public void TestSecondJob()
{
if (_test.Value != null)
{
Console.WriteLine("FAILED");
}
else
{
Console.WriteLine("WORKS");
}
_test.Value = "ABC123";
}
}
Please note that the BackgroundJob.Enqueue method doesn't capture AsyncLocal values, and they aren't designed to flow between different background jobs, such as TestJob.Execute and TestJob.TestSecondJob methods. What happens there is the difficulties in behavior of AsyncLocal in synchronous methods, e.g. the TestSecondJob one, while async ones work fine.
Asynchronous method automatically capture the current ExecutionContext instance (that holds values for AsyncLocal) and call ExecutionContext.Run method to restore AsyncLocal values in a new thread, and clean up any new values written during a background job method invocation.
For synchronous methods, nothing was captured before, because thread is the same, and no need to capture the current ExecutionContext. However, since ExecutionContext.Run method wasn't called, all the writes to AsyncValue.Value property were written to the current context, and persisted there until shutdown of a worker thread, leaking values and bringing differences between sync and async method executions.
So after the investigation, I'm treating the current behavior as bug, since it's inconsistent and error-prone. I have added different tests to ensure that after https://github.com/HangfireIO/Hangfire/commit/92c9de3a922182f4ae4a3b413e1f571ab7c703e0 behavior is consistent, and semantics for features like PerformContextAccessor remained the same, so don't add any feature flags here.