procrastinate icon indicating copy to clipboard operation
procrastinate copied to clipboard

Custom timeout per task and retry doesn't seem possible

Open saro2-a opened this issue 10 months ago • 2 comments

I was trying to restart stalled jobs, with custom timeouts.

We have several jobs that depending on the input they can either last 1 minute or 3h, with a uniform distribution. At the time of job submission we know how long it is going to take (more or less), but when I fetch "get_stalled_jobs" it seems the "started_at" of the event might not be retained at the creation of the job:

It is fetched: SELECT job.id, status, task_name, priority, lock, queueing_lock, args, scheduled_at, queue_name, attempts, max(event.at) started_at

but not retained https://github.com/procrastinate-org/procrastinate/blob/main/procrastinate/manager.py#L175 https://github.com/procrastinate-org/procrastinate/blob/main/procrastinate/jobs.py#L77

hence seemingly making the task impossible?

        @self.app.periodic(cron="*/10 * * * *")
        @self.app.task(queueing_lock="retry_stalled_jobs", pass_context=True)
        async def retry_stalled_jobs(context, timestamp):
            stalled_jobs = await self.app.job_manager.get_stalled_jobs(
                nb_seconds=RUNNING_JOBS_MAX_TIME_SECONDS
            )
            # TODO it is currently not possible to have some jobs with custom duration.
            # it needs to be solved at lib level
            for job in stalled_jobs:
                proc_task_max_run_time = job.task_kwargs.get("proc_task_max_run_time")
                if not proc_task_max_run_time or proc_task_max_run_time < now()- {{{ job.started_at ??where to get the start time of the event??}}}:
                    await self.app.job_manager.retry_job(job)

Could we either:

  • support proc_task_max_run_time as a first class parameter (probably preferred)
  • or pass the started_at?

Thank you

saro2-a avatar Jan 10 '25 11:01 saro2-a

This looks similar to https://github.com/procrastinate-org/procrastinate/issues/702 which we wanted to tackle in https://github.com/procrastinate-org/procrastinate/issues/740 with heartbeats

EDIT: well, no, timeouts and retrying are different. It's close but not the same. I'll try looking in more details.

ewjoachim avatar Jan 10 '25 15:01 ewjoachim

I think you're right in that the manager doesn't git access to the "Events" table. I think what would make the most sense is the ability to inspect the events of a job.

ewjoachim avatar Jan 10 '25 15:01 ewjoachim