Hangfire icon indicating copy to clipboard operation
Hangfire copied to clipboard

[Hangfire.Core Bug] ContinueJobWith starts too early by default with Task

Open hyperion-cs opened this issue 5 years ago • 5 comments

@odinserj, Hello! Under certain standard conditions ContinueJobWith starts by default when the dependent job is still in PROCESSING state.

How to reproduce:

  1. When called ContinueJobWith with Action:
_bjc.ContinueJobWith(jobId, () => Console.WriteLine("Continuation!"))

The public static string ContinueJobWith([NotNull] this IBackgroundJobClient client, [NotNull] string parentId, [InstantHandle][NotNull] Expression<Action> methodCall); overload will be used, everything is fine. "Continue job" will only be executed when the dependent job is done (success/failed). 2 However, if you use a Func, such as this:

Expression<Func<T, Task>> func = x => new Task(() => Console.WriteLine("Continuation!"));
_bjc.ContinueJobWith(jobId, func);

An overload of public static string ContinueJobWith<T>([NotNull] this IBackgroundJobClient client, [NotNull] string parentId, [InstantHandle][NotNull] Expression<Func<T, Task>> methodCall, [CanBeNull] IState nextState = null, JobContinuationOptions options = JobContinuationOptions.OnlyOnSucceededState); will be used.

It has nextState as null by default, which means that "continue job" will be started immediately after Hangfire.States.EnqueuedState. Thus, the default behavior is such that the job does not expect the job to finish FULL when using Func. In other words, if we want to use the Task, it doesn't work the way it should.

hyperion-cs avatar Apr 04 '20 16:04 hyperion-cs

Please tell me what version of Hangfire.Core you have, what storage package (and its version) you are using? Also, please post here the concrete minimal method that doesn't work as expected.

odinserj avatar Apr 04 '20 17:04 odinserj

@odinserj, Hangfire.Core 1.7.10 Hangfire.PostgreSql 1.6.4.2

When I tried to collect the minimum method for you, I came across the fact that it was not so easy. In an isolated environment, everything works fine. Could it be that ContinueJobWith has already worked, but the storage provider has not yet managed to update the state + result? Or does ContinueJobWith work based on the data from the storage and take nothing from RAM/etc directly?

hyperion-cs avatar Apr 04 '20 20:04 hyperion-cs

@odinserj 4 years later, still a valid bug. Furthermore, still non-deterministic. Works ok on my local env, but on my prod environment child jobs starts before the parent finishes causing race condition and undesired results.

jguc avatar Oct 31 '24 14:10 jguc

So the original problem description tells us that the following continuation works fine:

_bjc.ContinueJobWith(jobId, () => Console.WriteLine("Continuation!"))

But the following one doesn't (but actually it gives the System.ArgumentException: Expression body should be of type MethodCallExpression exception):

Expression<Func<T, Task>> func = x => new Task(() => Console.WriteLine("Continuation!"));
_bjc.ContinueJobWith(jobId, func);

This is an expected behavior, as expressions in Hangfire are meant to point to some method and some arguments. Method metadata is passed to the job storage, while arguments are evaluated inline to some value, serialized and stored as well. I believe that the original problem appeared, because task-based invocation was passed as an argument, therefore was executed immediately.

Consider the following expected ways of creating background jobs for static- and instance-based methods:

BackgroundJob.Enqueue(() => SomeStaticMethod(GetArgumentValue())); // Static method execution
BackgroundJob.Enqueue<SomeClass>(x => x.InstanceMethod(GetArgumentValue())); // Instance method execution

In this case, SomeStaticMethod and InstanceMethod methods will not be executed immediately, a background job that points to this method will be created instead. However, GetArgumentValue method will be executed immediately to obtain the actual argument value to serialize it and store within the background background job metadata. If the GetArgumentValue method contains any Console.WriteLine calls, they will be shown immediately.

@jguc can you tell me how your background jobs and continuations created?

odinserj avatar Nov 01 '24 02:11 odinserj

@odinserj the OP example was what I assume an oversimplification of the problem which obviously will execute immediately, let me try not to oversimplify my example then.

The flow uses Hangfire + MediatR for executing commands outside the process.

Commit commands that use ContinueJobWith:

// Ignore the purpose of having this container
var jobId = Client.Create(QueueNames.MEDIUM, CreateContainer, new ScheduledState(DateTime.MaxValue));

// actual parent job
var parentJobId = Client.ContinueJobWith<MediatRHangfireBridge>(jobId, 
    bridge => bridge.Process(eventData, typeof(TEvent).GenericTypeArguments[0].Name, QueueNames.MEDIUM));

// child job
Client.ContinueJobWith<MediatRHangfireBridge>(parentJobId, 
    bridge => bridge.Process(continuationEventData, typeof(TContinuationEvent).GenericTypeArguments[0].Name, QueueNames.MEDIUM));

MediatRHangfireBridge

  [DisplayName("{1}")]
  [Queue("{2}")]
  public async Task Publish<T>(T notification, string displayName, string queue) where T : INotification
  {
      await _mediator.Publish(notification);
  }

So the argument is serialized and stored properly in the job metadata for execution.

Jobs are started in correct order, but strangely enough the last child job get starte just milliseconds before the partent job finishes.

jguc avatar Nov 01 '24 11:11 jguc