ganga icon indicating copy to clipboard operation
ganga copied to clipboard

GangaLHCb: `lbexec` support in `GaudiExec` for Run 3 applications

Open ryuwd opened this issue 2 years ago • 4 comments

gaudirun.py support is removed from recent nightlies of DaVinci (as announced in the last LHCb week), meaning that submitting DV jobs with ganga will not work in the next Run3 release of DV.

It is possible to run lbexec jobs with a hack (here is an example), but it would be good eventually for ganga to support lbexec natively.

I have been thinking about how best to implement, but got stuck when thinking about data.py.

cc @chrisburr

ryuwd avatar Jul 25 '22 12:07 ryuwd

Yes, we discussed this already with @chrisburr . Is there any documentation that you can point to now about how lbexec works?

egede avatar Jul 25 '22 23:07 egede

Yes, we discussed this already with @chrisburr . Is there any documentation that you can point to now about how lbexec works?

I suppose

  • LHCb week talk https://indico.cern.ch/event/1160084/contributions/4891007/attachments/2462957/4223230/2022-06-15_lbexec-and-how-to-run-applications.pdf
  • Related merge requests https://gitlab.cern.ch/lhcb/LHCb/-/merge_requests?scope=all&state=merged&search=lbexec
  • (probably less helpful) Example Run3 analysis production configurations that use lbexec to run Moore and DaVinci https://gitlab.cern.ch/lhcb-datapkg/AnalysisProductions/-/merge_requests/257/diffs#4f3bae83e4f317e720f3e940f52e2ab0f4e0f434_0_80

~~Nothing in a neat sphinx site yet AFAIK~~ https://lhcb-davinci.docs.cern.ch/tutorials/running.html

To support lbexec the GaudiExec application would probably need these parameters

  • (maybe unnecessary) use_lbexec (bool): a flag telling GaudiExec not to use gaudirun.py
  • a new one: entrypoint (str): the 'function' to run e.g. MyOptionsFile:alg_config
  • options (list[str] or str) can be kept, accepting instead yaml, not .py files, to be passed to lbexec. Alternatively it could even accept a dict and generate+upload the yaml file for the user. (this would be really useful for passing things like conddb and dddb tags without having to code generate and write a file)

Options files (e.g. MyOptionsFile.py) containing the function specified in the entrypoint to configure the job would need to be added to the input sandbox. Seems like an easy place for end-users to slip up.

Where I got a bit stuck is how ganga configures input files. At the moment a file called data.py is generated from some gaudi options code templates in LHCbDataset which is dumped into job._splitterdata then into the input sandbox. But I'm not sure how that could be best adapted to generate YAML that works with lbexec. I suppose PFNs could be predetermined in advance but that seems like it would constrain what sites the configured subjob could run at to one site, and I don't know if input_files in lbexec YAML are able to deal with LFNs and XML catalogs either. I guess in this situation you and @chrisburr know best

(I take an interest in this issue because I have been doing some run3 studies recently with ganga + for the earlier mentioned reasons)

ryuwd avatar Jul 26 '22 09:07 ryuwd

I'm away this week but I'll respond soon with a suggestion of how Ganga can support this.

chrisburr avatar Jul 26 '22 12:07 chrisburr

@chrisburr Ping

egede avatar May 25 '23 22:05 egede