procrastinate icon indicating copy to clipboard operation
procrastinate copied to clipboard

Add hooks / callbacks to worker

Open medihack opened this issue 9 months ago • 5 comments
trafficstars

As discussed in the middleware feature request (#1292), I want to propose an alternative solution that uses hooks/callbacks at specific locations in the worker that a user can plug into. The implementation is easier, and the usage is, in my opinion, more flexible, as we can add additional hooks if needed.

I would start with the following ones (https://github.com/procrastinate-org/procrastinate/tree/worker-hooks):

  • before_fetching_job_hook (should work fine for rate limiting)
  • job_processing_started_hook (has access to the JobContext)
  • job_processing_ended_hook (has access to the JobContext and JobResult)

I would also implement that if any of those hooks (or the job itself) raise a StopWorker exception, then worker.stop() will be called.

Do you have any opinions about using hooks instead of middleware?

medihack avatar Jan 27 '25 20:01 medihack

Job started and job complete (whether successful or not) calbacks would be useful.

I am not sure about before fetch. If this is intended to be used as a way to determine if a job should be fetched, I don't know how that would quite work.

onlyann avatar Jan 27 '25 20:01 onlyann

I think providing a wrapper function for tasks would be a way to provide a hook before and after, and if we propagate the exception, it can also discriminate between success and failure, all with a usual API. That said, if we provide individual hooks, they can be easily implemented as wrappers underneath (and probably vice versa)

About the before-fetch: it could be useful, but this all depends on how it may affect the fetch query. If it doesn't affect the query, then I don't see what it may do except a very basic rate limit (that works only if the worker processes only the rate limited task).

Do we have other ideas for usecases of pre-fetch (or any other hook: pre start, pre stop, abort, anything ...) ? Should we allow task-related hooks to be bound to specific tasks or should hooks be always worker-wide ?

ewjoachim avatar Jan 27 '25 21:01 ewjoachim

I think providing a wrapper function for tasks would be a way to provide a hook before and after, and if we propagate the exception, it can also discriminate between success and failure, all with a usual API. That said, if we provide individual hooks, they can be easily implemented as wrappers underneath (and probably vice versa)

Yes, and we can let the user plug more into the whole lifecycle of the worker then with middleware. Of course, we could have different kinds of middleware wrapping different stuff, but I find this cumbersome.

About the before-fetch: it could be useful, but this all depends on how it may affect the fetch query. If it doesn't affect the query, then I don't see what it may do except a very basic rate limit (that works only if the worker processes only the rate limited task).

Is really only basic rate limiting possible with such a hook? With the help of advisory locks, I could even think of more complicated scenarios across multiple workers. But of course, it is not straightforward for the user, and maybe a built-in way would be preferable. What do you mean by "that works only if the worker processes only the rate limited task"?

Do we have other ideas for usecases of pre-fetch (or any other hook: pre start, pre stop, abort, anything ...) ? Should we allow task-related hooks to be bound to specific tasks or should hooks be always worker-wide ?

If we go the Hooks way, then let's start simple. The user can still get the task name from the hook-provided context and act accordingly. So, I see them more as lifecycle hooks of the worker.

medihack avatar Jan 27 '25 22:01 medihack

Yes, i favor a hook at the worker level for when a job completes.

This is subtly different from a middleware because by the moment the middleware retrieves the result (or catches an error), the job has not yet completed and it could still fail. A hook on the other hand would be called after the job actually completes (whether it succeeded or failed).

That said, a basic middleware feature can be first offered and a hook later on added if the community identifies the need.

onlyann avatar Jan 28 '25 02:01 onlyann

With middleware, it should be easier to implement integrations such as OpenTelemetry (#1027) that require the entire lifecycle of a job. On the other hand, callback hooks are more suitable for smaller tasks to react to certain events.

Maybe the middleware approach together with some lightweight hooks is the way to go?

frisia-mtz avatar Feb 19 '25 10:02 frisia-mtz