DIRAC icon indicating copy to clipboard operation
DIRAC copied to clipboard

Gracefully stopping pilots

Open chrisburr opened this issue 6 months ago • 0 comments

Currently on the LHCb HLT farm the CE is configured to send SIGUSR1 to the Gaudi processes directly.

This has a couple of issues:

  1. If there is more than one job per pilot, the first job will exit gracefully but the pilot isn't aware of the graceful shutdown and will start another pilot.
  2. Sometimes the job hasn't got far enough to produce any output despite the situation being "okay"

I think the solution to both items is to have the CE communciate with DIRAC instead of Gaudi.

For 2 it would be useful to make it easier to filter out such cases, e.g. set the status to "Killed" and set a clearer application status.

chrisburr avatar Oct 17 '25 08:10 chrisburr