data-curation icon indicating copy to clipboard operation
data-curation copied to clipboard

CMS: ML release checklist

Open katilp opened this issue 4 years ago • 0 comments

Release guidelines; see https://twiki.cern.ch/twiki/bin/view/CMS/DPOAMLSampleReleaseGuidelines

Agreements

  • [ ] the ML group agrees that these samples are of interest for a public release
    • presented/discussed in (meeting/presentation link)
  • [ ] the relevant POGs/PAGs and physics coordination agrees that these samples, their parent datasets and workflows to produce them can be brought into the public domain
    • presented/discussed in (meeting/presentation link)
  • [ ] CB approval

Parent dataset (if not already public data)

  • [ ] identify the exact parent dataset (link in DAS)
  • [ ] provenance (link to McM)
  • [ ] create a CODP record (example http://opendata.cern.ch/record/12201)
    • the record description goes in a new file cms-simulated-datasets-Phase2-datascience.json in https://github.com/cernopendata/opendata.cern.ch/tree/master/cernopendata/modules/fixtures/data/records (or in case of Run2 dataset add the json description of the record in cms-simulated-datasets-Run2-datascience.json)
  • [ ] transfer to T3_CH_CERN_OpenData

ML sample production

  • [ ] CMSSW version and OS for ML sample production? Check if a corresponding docker container is available in https://cmssw-docker.web.cern.ch/available-images , if not request
  • [ ] is a GT is needed to produce the ML sample? If yes, have it transferred as sqlite files in /cvmfs/cms-opendata-conddb.cern.ch/
  • [ ] make the ML sample production workflow available in a repository in https://github.com/cms-opendata-analyses, call it [DescriptiveName]ProducerTool (example https://github.com/cms-opendata-analyses/TrackerRecHitProducerTool)
  • [ ] make sure that the ML production workflow runs in the CMS Open data environment (docker container or the CMS Open Data VM)
  • [ ] Add a test workflow as a github action
  • [ ] Create the corresponding CODP record for the production workflow (example http://opendata.cern.ch/record/12210)
    • the record description goes in a new file cms-tools-Phase2-datascience.json in https://github.com/cernopendata/opendata.cern.ch/tree/master/cernopendata/modules/fixtures/data/records (or is added to cms-tools-Run2-datascience.json, if Run2 samples)

ML sample file

  • [ ] produce the files
  • [ ] upload them to /eos/opendata/cms/upload
  • [ ] prepare the CODP record (example http://opendata.cern.ch/record/12220)
    • the record description goes in a new file cms-derived-Phase2-datascience.json in https://github.com/cernopendata/opendata.cern.ch/tree/master/cernopendata/modules/fixtures/data/records (or is add to cms-derived-Run2-datascience.json if Run2 data)

ML sample usage example

  • [ ] prepare a usage example, reading the ML samples from the eospublic, runnable in the CMS Open Data environment
  • [ ] make the usage example available in a repository in https://github.com/cernopendata-datascience

katilp avatar Nov 05 '20 21:11 katilp