ganga
ganga copied to clipboard
GangaLHCb: `lbexec` support in `GaudiExec` for Run 3 applications
gaudirun.py
support is removed from recent nightlies of DaVinci
(as announced in the last LHCb week), meaning that submitting DV jobs with ganga
will not work in the next Run3 release of DV.
It is possible to run lbexec
jobs with a hack (here is an example), but it would be good eventually for ganga
to support lbexec
natively.
I have been thinking about how best to implement, but got stuck when thinking about data.py
.
cc @chrisburr
Yes, we discussed this already with @chrisburr . Is there any documentation that you can point to now about how lbexec
works?
Yes, we discussed this already with @chrisburr . Is there any documentation that you can point to now about how
lbexec
works?
I suppose
- LHCb week talk https://indico.cern.ch/event/1160084/contributions/4891007/attachments/2462957/4223230/2022-06-15_lbexec-and-how-to-run-applications.pdf
- Related merge requests https://gitlab.cern.ch/lhcb/LHCb/-/merge_requests?scope=all&state=merged&search=lbexec
- In particular this one
- (probably less helpful) Example Run3 analysis production configurations that use
lbexec
to run Moore and DaVinci https://gitlab.cern.ch/lhcb-datapkg/AnalysisProductions/-/merge_requests/257/diffs#4f3bae83e4f317e720f3e940f52e2ab0f4e0f434_0_80
~~Nothing in a neat sphinx site yet AFAIK~~ https://lhcb-davinci.docs.cern.ch/tutorials/running.html
To support lbexec
the GaudiExec
application would probably need these parameters
- (maybe unnecessary)
use_lbexec
(bool
): a flag tellingGaudiExec
not to usegaudirun.py
- a new one:
entrypoint
(str
): the 'function' to run e.g.MyOptionsFile:alg_config
-
options
(list[str]
orstr
) can be kept, accepting insteadyaml
, not.py
files, to be passed tolbexec
. Alternatively it could even accept adict
and generate+upload the yaml file for the user. (this would be really useful for passing things like conddb and dddb tags without having to code generate and write a file)
Options files (e.g. MyOptionsFile.py
) containing the function specified in the entrypoint
to configure the job would need to be added to the input sandbox. Seems like an easy place for end-users to slip up.
Where I got a bit stuck is how ganga
configures input files. At the moment a file called data.py
is generated from some gaudi options code templates in LHCbDataset
which is dumped into job._splitterdata
then into the input sandbox. But I'm not sure how that could be best adapted to generate YAML that works with lbexec
. I suppose PFNs could be predetermined in advance but that seems like it would constrain what sites the configured subjob could run at to one site, and I don't know if input_files
in lbexec
YAML are able to deal with LFNs and XML catalogs either. I guess in this situation you and @chrisburr know best
(I take an interest in this issue because I have been doing some run3 studies recently with ganga
+ for the earlier mentioned reasons)
I'm away this week but I'll respond soon with a suggestion of how Ganga can support this.
@chrisburr Ping