opendata.cern.ch
opendata.cern.ch copied to clipboard
CMS: prepare data records for Run2 Hbb and QCD MC for ML studies
prepare data records for the Run2 samples used in ML file production
from #2448 signal MC:
/BulkGravTohhTohbbhbb_narrow_M-600_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1200_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1400_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1600_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1800_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-2000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-2000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-2500_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-3000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-4000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-4500_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
qcd bkg /QCD_Pt_300to470_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_470to600_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_600to800_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_800to1000_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_1000to1400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_1400to1800_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_1800to2400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_2400to3200_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_3200toInf_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v3/MINIAODSIM
and from #2447 /QCD_Pt-15to7000_TuneCUETP8M1_Flat_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_magnetOn_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM
Release: CMSSW_8_0_21 Global Tag: 80X_mcRun2_asymptotic_2016_TrancheIV_v6
The files are in https://eospublichttp.cern.ch/eos/opendata/cms/MonteCarlo2016
These are standard samples (i.e. they are part of a normal production with no modifications in the standard workflow)
These are first MiniAODSIM format sample on the portal. There is a notion of them in the updated http://opendata-dev.web.cern.ch/docs/about-cms
We may want to link to for further information of this format in usage
(https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016)
To do:
- [x] prepare records
- [x] extract metadata (CMSDAS), NB one further production step -> MiniAODSIM
Create cms-simulated-datasets-Run2-datascience.json (similar to other cms-simulated-dataset...)
Add to the abstract-description in the last paragraph:
The contents of MINIAODSIM in these datasets may differ from the final legacy format used in CMS Run2 simulated datasets.
To do:
- [ ] Check why LHE step is missing for Hbb samples e.g. http://opendata-dev.web.cern.ch/record/12007
- update: the extraction script gets the steps correctly and reads them to cache, but the it was not propagated in record building yet
- [x] Change the order of steps in provenance
- [x] Monte Carlo Production Overview -> Monte Carlo production overview
Nice test case:
- record ID 12009
- script finds Summer15 parent:
$ cat cache/run2-datascience/mcm-store/dict/@BulkGravTohhTohbbhbb_narrow_M-3000_13TeV-madgraph@RunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1@AODSIM.json | jq '.results.input_dataset'
"/BulkGravTohhTohbbhbb_narrow_M-3000_13TeV-madgraph/RunIISummer15wmLHEGS-MCRUN2_71_V1_ext1-v1/GEN-SIM"
- but does not go there:
$ ls -l cache/run2-datascience/mcm-store/dict/ | grep -c Summer15
0
To be checked while rerunning...
BulkGravTohhTohbbhbb* datasets:
- Case A: 12000,12001,12007,12009 : all LHE step missing, sim config and production script in SIM step missing
- Case B: 12002-12006, 12008, 12010, 12011 : LHE step present (but no configs), production script in SIM step missing
QCD* datasets:
- Case C: 12012-12019, 12021: production script in SIM step missing
- Case D: 12020: : production script in SIM step missing, step HLT RECO with no name and no links
General:
- generators appear in HLT RECO step, would make more sense in LHE SIM steps
- generator parametr fragments not shown
Case A, e.g 12000:
- SIM step (search for output) found in McM https://cms-pdmv.cern.ch/mcm/requests?produce=%2FBulkGravTohhTohbbhbb_narrow_M-600_13TeV-madgraph%2FRunIISummer15wmLHEGS-MCRUN2_71_V1_ext1-v1%2FGEN-SIM&page=0&shown=70368744177663
- config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/9d1ace9d1b37fb0202c93baacabce721/configFile
- prodcution script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/B2G-RunIISummer15wmLHEGS-00784
- generator parameter fragment in https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/B2G-RunIISummer15wmLHEGS-00784/0
- NB: LHE and SIM steps together
Case B, e.g. 12002
- SIM step (search for output) found in McM https://cms-pdmv.cern.ch/mcm/requests?produce=%2FBulkGravTohhTohbbhbb_narrow_M-1200_13TeV-madgraph%2FRunIISummer15GS-MCRUN2_71_V1-v1%2FGEN-SIM&page=0&shown=70368744177663
- config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/2557f311e3275b75b2e1ea1cb1b2e449/configFile
- production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/EXO-RunIISummer15GS-01225
- no direct link to generator parameter fragment but a tag given (ad6d8bb0dd5fa4907e5f2cb6e1a70d2bf487c312), refers to https://raw.githubusercontent.com/cms-sw/genproductions/ad6d8bb0dd5fa4907e5f2cb6e1a70d2bf487c312/python/ThirteenTeV/Hadronizer_TuneCUETP8M1_13TeV_generic_LHE_pythia8_cff.py which is in the production script and linked from "Name of the fragment"
- NB: LHE step run separately https://cms-pdmv.cern.ch/mcm/requests?produce=%2FBulkGravTohhTohbbhbb_narrow_M-1200_13TeV-madgraph%2FRunIIWinter15wmLHE-MCRUN2_71_V1-v2%2FLHE&page=0&shown=70368744177663 with
- production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/EXO-RunIIWinter15wmLHE-00267
- config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/df70dc41b476c73227f0e873ba1a2b86/configFile
- generator parameter fragment https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/EXO-RunIIWinter15wmLHE-00267/0
Case C, e.g. 12012
- SIM step (search for output) found in https://cms-pdmv.cern.ch/mcm/requests?produce=%2FQCD_Pt_300to470_TuneCUETP8M1_13TeV_pythia8%2FRunIISummer15GS-MCRUN2_71_V1-v1%2FGEN-SIM&page=0&shown=70368744177663
- production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/BTV-RunIISummer15GS-00028
- config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/b51f8e75069872bae6c85134469190eb/configFile
- no direct link to generator parameter fragment but a tag given (85ccec18763776e0613830cde524bc7a4c77bc49), refers to https://raw.githubusercontent.com/cms-sw/genproductions/85ccec18763776e0613830cde524bc7a4c77bc49/python/ThirteenTeV/QCD_Pt_300to470_TuneCUETP8M1_13TeV_pythia8_cff.py which is in the production script and linked from "Name of the fragment"
Case D, 12020
- SIM step (search for output) found in https://cms-pdmv.cern.ch/mcm/requests?produce=%2FQCD_Pt_3200toInf_TuneCUETP8M1_13TeV_pythia8%2FRunIISummer15GS-MCRUN2_71_V1-v1%2FGEN-SIM&page=0&shown=70368744177663
- production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/BTV-RunIISummer15GS-00036
- config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/b51f8e75069872bae6c8513446aac464/configFile
- no direct link to generator parameter fragment but a tag given (d074b0464ffff3d4ee30b34a1336f2618930571c), refers to https://raw.githubusercontent.com/cms-sw/genproductions/d074b0464ffff3d4ee30b34a1336f2618930571c/python/ThirteenTeV/QCD_Pt_3200toInf_TuneCUETP8M1_13TeV_pythia8_cff.py which is in the production script and linked from "Name of the fragment"
- HLT RECO step (search for output) found in McM https://cms-pdmv.cern.ch/mcm/requests?produce=%2FQCD_Pt_3200toInf_TuneCUETP8M1_13TeV_pythia8%2FRunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v3%2FAODSIM&page=0&shown=70368744177663
- production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/BTV-RunIISummer16DR80Premix-00044
- HLT config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/2841cad1ab1ea0d683f04ea7b7e0443b/configFile
- RECO config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/2841cad1ab1ea0d683f04ea7b7e11046/configFile