DIRAC icon indicating copy to clipboard operation
DIRAC copied to clipboard

Make the PushJobAgent compatible with more CEs (HTCondor? SSH?)

Open aldbr opened this issue 3 months ago • 7 comments

So that it can be reused for HPCs with no ARC endpoint.

aldbr avatar Sep 19 '25 09:09 aldbr

Who is requesting this?

fstagni avatar Oct 09 '25 08:10 fstagni

@hmiyake would potentially reuse the PushJobAgent using ssh if I remember correctly.

aldbr avatar Oct 13 '25 11:10 aldbr

Sorry for late reply.

We have several use cases for pilot submission to non-grid CE.

One is local HTCondor site running no HTCondorCE, then we needed to deploy DIRAC to submit pilots. That site does not open ssh for massive job execution.

The other is sites having a local batch system and opening ssh. Currently we submit pilots via SSHComputingElement.

For both use cases, those sites have no special restriction in outbound connectivity. I've read https://dirac.diracgrid.org/en/integration/AdministratorGuide/Resources/supercomputers.html#no-outbound-connectivity and wonder if we can use PushJobAgent in our use cases... Or is there any advantage in scalability compared with, say, SSHComputingElement?

As a background of our story, during DUW I consulted about HTCondor site, for the case that we migrate to DiracX and cannot deploy external DiracX on the site.

hmiyake avatar Oct 22 '25 10:10 hmiyake

SSHComputingElement (and in general the usual "pull" pilot mode) is more scalable than PushJobAgent, which is a solution to be used mostly if the worker nodes are not open to outside network.

fstagni avatar Oct 22 '25 10:10 fstagni

That site does not open ssh for massive job execution.

Out of curiosity, could you clarify this point please? Do you mean that the site is protected through a VPN and you cannot connect through SSH via DIRAC?

aldbr avatar Oct 22 '25 11:10 aldbr

Thank you for clarification!

hmiyake avatar Oct 23 '25 07:10 hmiyake

Regarding as the specific site, it opens ssh port for maintenance but the site admin did not like the port is heavily accessed, at that time.

So we did not use SSHComputingElement but LocalComputingElement.

hmiyake avatar Oct 23 '25 07:10 hmiyake