DIRAC icon indicating copy to clipboard operation
DIRAC copied to clipboard

Reducing the memory consumption of the `PushJobAgent`

Open aldbr opened this issue 1 year ago • 0 comments

The objective is to enhance the exploitation of HPCs with no external connectivity in DIRAC. The current workflow is limited:

  • The PushJobAgent only works if you use the dirac-jobexec executable.
  • The PushJobAgent supports a very limited number of jobs in parallel (~150 jobs would consume ~50GB of memory on your DIRAC server).

I would like to greatly reduce these limitations by deploying a series of PRs:

  • [X] split the JobWrapper.execute() method into 3 sub methods (preProcess, process, postProcess) to better isolate operations involving communications with the external (DIRAC) services from the payload itself. (https://github.com/DIRACGrid/DIRAC/pull/7460)
  • [x] introduce the JobWrapperOfflineTemplate that solely executes the process method. (https://github.com/DIRACGrid/DIRAC/pull/7529)
  • [ ] adapt the PushJobAgent to these changes so that it submits a JobWrapperOfflineTemplate directly to a remote CE. For a while, both the "traditional" and the new approaches will be supported. (https://github.com/DIRACGrid/DIRAC/pull/7587)

https://github.com/DIRACGrid/DIRAC/pull/7422 contains a overview of the full picture if you are interested in (will not be merged).

aldbr avatar Feb 09 '24 09:02 aldbr