opendata.cern.ch icon indicating copy to clipboard operation
opendata.cern.ch copied to clipboard

CMS: prepare data records for Run2 Hbb and QCD MC for ML studies

Open katilp opened this issue 6 years ago • 5 comments

prepare data records for the Run2 samples used in ML file production

from #2448 signal MC:

/BulkGravTohhTohbbhbb_narrow_M-600_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1200_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1400_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1600_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-1800_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-2000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-2000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-2500_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-3000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-4000_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /BulkGravTohhTohbbhbb_narrow_M-4500_13TeV-madgraph/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

qcd bkg /QCD_Pt_300to470_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_470to600_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_600to800_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_800to1000_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_1000to1400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_1400to1800_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_1800to2400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_2400to3200_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM /QCD_Pt_3200toInf_TuneCUETP8M1_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v3/MINIAODSIM

and from #2447 /QCD_Pt-15to7000_TuneCUETP8M1_Flat_13TeV_pythia8/RunIISummer16MiniAODv2-PUMoriond17_magnetOn_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM

Release: CMSSW_8_0_21 Global Tag: 80X_mcRun2_asymptotic_2016_TrancheIV_v6

The files are in https://eospublichttp.cern.ch/eos/opendata/cms/MonteCarlo2016

These are standard samples (i.e. they are part of a normal production with no modifications in the standard workflow)

These are first MiniAODSIM format sample on the portal. There is a notion of them in the updated http://opendata-dev.web.cern.ch/docs/about-cms aboutminiaod

We may want to link to for further information of this format in usage (https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookMiniAOD2016)

To do:

  • [x] prepare records
  • [x] extract metadata (CMSDAS), NB one further production step -> MiniAODSIM

katilp avatar Mar 21 '19 14:03 katilp

Create cms-simulated-datasets-Run2-datascience.json (similar to other cms-simulated-dataset...)

katilp avatar Mar 26 '19 10:03 katilp

Add to the abstract-description in the last paragraph:

The contents of MINIAODSIM in these datasets may differ from the final legacy format used in CMS Run2 simulated datasets.

katilp avatar Apr 03 '19 13:04 katilp

To do:

  • [ ] Check why LHE step is missing for Hbb samples e.g. http://opendata-dev.web.cern.ch/record/12007
    • update: the extraction script gets the steps correctly and reads them to cache, but the it was not propagated in record building yet
  • [x] Change the order of steps in provenance
  • [x] Monte Carlo Production Overview -> Monte Carlo production overview

katilp avatar Apr 13 '19 11:04 katilp

Nice test case:

  • record ID 12009
  • script finds Summer15 parent:
$ cat cache/run2-datascience/mcm-store/dict/@BulkGravTohhTohbbhbb_narrow_M-3000_13TeV-madgraph@RunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6_ext1-v1@AODSIM.json | jq '.results.input_dataset'
"/BulkGravTohhTohbbhbb_narrow_M-3000_13TeV-madgraph/RunIISummer15wmLHEGS-MCRUN2_71_V1_ext1-v1/GEN-SIM"
  • but does not go there:
$ ls -l cache/run2-datascience/mcm-store/dict/ | grep -c Summer15      
0

To be checked while rerunning...

tiborsimko avatar Jun 06 '19 15:06 tiborsimko

BulkGravTohhTohbbhbb* datasets:

  • Case A: 12000,12001,12007,12009 : all LHE step missing, sim config and production script in SIM step missing
  • Case B: 12002-12006, 12008, 12010, 12011 : LHE step present (but no configs), production script in SIM step missing

QCD* datasets:

  • Case C: 12012-12019, 12021: production script in SIM step missing
  • Case D: 12020: : production script in SIM step missing, step HLT RECO with no name and no links

General:

  • generators appear in HLT RECO step, would make more sense in LHE SIM steps
  • generator parametr fragments not shown

Case A, e.g 12000:

  • SIM step (search for output) found in McM https://cms-pdmv.cern.ch/mcm/requests?produce=%2FBulkGravTohhTohbbhbb_narrow_M-600_13TeV-madgraph%2FRunIISummer15wmLHEGS-MCRUN2_71_V1_ext1-v1%2FGEN-SIM&page=0&shown=70368744177663
    • config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/9d1ace9d1b37fb0202c93baacabce721/configFile
    • prodcution script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/B2G-RunIISummer15wmLHEGS-00784
  • generator parameter fragment in https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/B2G-RunIISummer15wmLHEGS-00784/0
  • NB: LHE and SIM steps together

Case B, e.g. 12002

  • SIM step (search for output) found in McM https://cms-pdmv.cern.ch/mcm/requests?produce=%2FBulkGravTohhTohbbhbb_narrow_M-1200_13TeV-madgraph%2FRunIISummer15GS-MCRUN2_71_V1-v1%2FGEN-SIM&page=0&shown=70368744177663
    • config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/2557f311e3275b75b2e1ea1cb1b2e449/configFile
    • production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/EXO-RunIISummer15GS-01225
    • no direct link to generator parameter fragment but a tag given (ad6d8bb0dd5fa4907e5f2cb6e1a70d2bf487c312), refers to https://raw.githubusercontent.com/cms-sw/genproductions/ad6d8bb0dd5fa4907e5f2cb6e1a70d2bf487c312/python/ThirteenTeV/Hadronizer_TuneCUETP8M1_13TeV_generic_LHE_pythia8_cff.py which is in the production script and linked from "Name of the fragment"
    • NB: LHE step run separately https://cms-pdmv.cern.ch/mcm/requests?produce=%2FBulkGravTohhTohbbhbb_narrow_M-1200_13TeV-madgraph%2FRunIIWinter15wmLHE-MCRUN2_71_V1-v2%2FLHE&page=0&shown=70368744177663 with
      • production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/EXO-RunIIWinter15wmLHE-00267
      • config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/df70dc41b476c73227f0e873ba1a2b86/configFile
      • generator parameter fragment https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/EXO-RunIIWinter15wmLHE-00267/0

Case C, e.g. 12012

  • SIM step (search for output) found in https://cms-pdmv.cern.ch/mcm/requests?produce=%2FQCD_Pt_300to470_TuneCUETP8M1_13TeV_pythia8%2FRunIISummer15GS-MCRUN2_71_V1-v1%2FGEN-SIM&page=0&shown=70368744177663
    • production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/BTV-RunIISummer15GS-00028
    • config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/b51f8e75069872bae6c85134469190eb/configFile
    • no direct link to generator parameter fragment but a tag given (85ccec18763776e0613830cde524bc7a4c77bc49), refers to https://raw.githubusercontent.com/cms-sw/genproductions/85ccec18763776e0613830cde524bc7a4c77bc49/python/ThirteenTeV/QCD_Pt_300to470_TuneCUETP8M1_13TeV_pythia8_cff.py which is in the production script and linked from "Name of the fragment"

Case D, 12020

  • SIM step (search for output) found in https://cms-pdmv.cern.ch/mcm/requests?produce=%2FQCD_Pt_3200toInf_TuneCUETP8M1_13TeV_pythia8%2FRunIISummer15GS-MCRUN2_71_V1-v1%2FGEN-SIM&page=0&shown=70368744177663
    • production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/BTV-RunIISummer15GS-00036
    • config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/b51f8e75069872bae6c8513446aac464/configFile
    • no direct link to generator parameter fragment but a tag given (d074b0464ffff3d4ee30b34a1336f2618930571c), refers to https://raw.githubusercontent.com/cms-sw/genproductions/d074b0464ffff3d4ee30b34a1336f2618930571c/python/ThirteenTeV/QCD_Pt_3200toInf_TuneCUETP8M1_13TeV_pythia8_cff.py which is in the production script and linked from "Name of the fragment"
  • HLT RECO step (search for output) found in McM https://cms-pdmv.cern.ch/mcm/requests?produce=%2FQCD_Pt_3200toInf_TuneCUETP8M1_13TeV_pythia8%2FRunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v3%2FAODSIM&page=0&shown=70368744177663
    • production script https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_setup/BTV-RunIISummer16DR80Premix-00044
    • HLT config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/2841cad1ab1ea0d683f04ea7b7e0443b/configFile
    • RECO config https://cmsweb.cern.ch/couchdb/reqmgr_config_cache/2841cad1ab1ea0d683f04ea7b7e11046/configFile

katilp avatar Jul 15 '19 13:07 katilp