cmssw icon indicating copy to clipboard operation
cmssw copied to clipboard

Make the container-in-container feature optional

Open lviliani opened this issue 3 weeks ago • 11 comments

This PR aims at making optional the container-in-container feature previously included.

The logic behind this implementation is:

  • keep the run_generic_tarball_cvmfs.sh back-compatible and include only minimal changes, e.g. without adding new optional positional parameters and altering the order of the existing ones.
  • change only the run_generic_tarball_cvmfs.sh script without touching the ExternalLHEProducer.

The solution I came up with is adding the use_singularity keyword as an optional argument. If this keyword is found, it will switch the corresponding flag and the event generation will be run through singularity, but all the additional optional parameters will be kept as they were previously (using the set builtin).

There are for sure alternative solutions, so I would like to hear other experts opinion and I'm happy to adapt the script if needed.

Using this implementation, the singularity run will be triggered using the following ExternalLHEProducer snippet:

externalLHEProducer = cms.EDProducer("ExternalLHEProducer",
    args = cms.vstring('/cvmfs/cms.cern.ch/phys_generator/gridpacks/PATH_TO_GRIDPACK','use_singularity'),
    nEvents = cms.untracked.uint32(5000),
    numberOfParameters = cms.uint32(2),
    outputFile = cms.string('cmsgrid_final.lhe'),
    generateConcurrently = cms.untracked.bool(False),
    scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh')
)

Which contains the additional 'use_singularity' string in args and numberOfParameters = cms.uint32(2) instead of 1.

FYI: @DickyChant @vlimant @stlammel @srimanob @ckoraka

lviliani avatar Dec 01 '25 17:12 lviliani

cms-bot internal usage

cmsbuild avatar Dec 01 '25 17:12 cmsbuild

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-49527/47027

cmsbuild avatar Dec 01 '25 17:12 cmsbuild

A new Pull Request was created by @lviliani for master.

It involves the following packages:

  • GeneratorInterface/LHEInterface (generators)

@cmsbuild, @lviliani, @mkirsano, @sensrcn, @theofil can you please review it and eventually sign? Thanks. @alberto-sanchez, @mkirsano this is something you requested to watch as well. @ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

cmsbuild avatar Dec 01 '25 17:12 cmsbuild

Many Thanks Lorenzo! - Stephan

stlammel avatar Dec 01 '25 17:12 stlammel

sounds nice indeed. I wonder how we are going to pass the use_singularity and numberOfParameters+=1 to production job, if that's the intended purpose

vlimant avatar Dec 01 '25 18:12 vlimant

sounds nice indeed. I wonder how we are going to pass the use_singularity and numberOfParameters+=1 to production job, if that's the intended purpose

Maybe we could set singularity as the default and to disable it if we provide an extra parameter (e.g. skip singularity), assuming most jobs will run singularity now

stahlleiton avatar Dec 01 '25 18:12 stahlleiton

sounds nice indeed. I wonder how we are going to pass the use_singularity and numberOfParameters+=1 to production job, if that's the intended purpose

For future production jobs, this can be done at GEN fragment level for wmLHE requests/workflows.

Or if we are talking about current production jobs, I think the intention here is avoid reverting the PR made before but rather make it optional for 16_X onwards, which means we don't really have such production jobs except for 16_0_X relvals.

DickyChant avatar Dec 01 '25 18:12 DickyChant

sounds nice indeed. I wonder how we are going to pass the use_singularity and numberOfParameters+=1 to production job, if that's the intended purpose

Maybe we could set singularity as the default and to disable it if we provide an extra parameter (e.g. skip singularity), assuming most jobs will run singularity now

Regarding the default parameter treatment, my suggestion is to keep the default to false for now since we have not yet finalized the container-in-container feature, but we can easily change it to true in a patch release once we have the feature working.

DickyChant avatar Dec 01 '25 18:12 DickyChant

sounds nice indeed. I wonder how we are going to pass the use_singularity and numberOfParameters+=1 to production job, if that's the intended purpose

As Sitian said, we can use the following snippet in the fragments in central production if we want to activate the "container-in-container" feature for future requests:

externalLHEProducer = cms.EDProducer("ExternalLHEProducer",
    args = cms.vstring('/cvmfs/cms.cern.ch/phys_generator/gridpacks/PATH_TO_GRIDPACK','use_singularity'),
    nEvents = cms.untracked.uint32(5000),
    numberOfParameters = cms.uint32(2),
    outputFile = cms.string('cmsgrid_final.lhe'),
    generateConcurrently = cms.untracked.bool(False),
    scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh')
)

Otherwise, if we use standard fragments, the production will proceed without singularity.

If at some point we manage to have a solution for the container-in-container problem, we can switch to using singularity as the default option.

lviliani avatar Dec 02 '25 09:12 lviliani

please test

lviliani avatar Dec 02 '25 09:12 lviliani

+1

Size: This PR adds an extra 24KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ff874e/49749/summary.html COMMIT: 3509cacf00d495cd68b62a0e1be129af3bc06e1a CMSSW: CMSSW_16_0_X_2025-12-01-2300/el8_amd64_gcc13 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/49527/49749/install.sh to create a dev area with all the needed externals and cmssw changes.

DAS Queries: The DAS query tests failed, see the summary page for details.

Comparison Summary

Summary:

  • You potentially removed 1 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 4 differences found in the comparisons
  • Reco comparison had 4 failed jobs
  • DQMHistoTests: Total files compared: 53
  • DQMHistoTests: Total histograms compared: 4269233
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4269210
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 52 files compared)
  • Checked 227 log files, 198 edm output root files, 53 DQM output files
  • TriggerResults: no differences found

cmsbuild avatar Dec 02 '25 18:12 cmsbuild

+1

theofil avatar Dec 16 '25 18:12 theofil

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @ftenchini (and backports should be raised in the release meeting by the corresponding L2)

cmsbuild avatar Dec 16 '25 18:12 cmsbuild

+1

mandrenguyen avatar Dec 17 '25 08:12 mandrenguyen