data-curation icon indicating copy to clipboard operation
data-curation copied to clipboard

Data ingestion and curation tools

Results 54 data-curation issues
Sort by recently updated
recently updated
newest added

Create a new directory `cms-2016-simulated-datasets` with ### inputs - [x] get the input files, Mini and Nano separately with dasgoclient -query="dataset=/*/RunIISummer20UL17*MiniAOD*v2-106X*/MINIAODSIM" > inputs/CMS-2017-mc-mini-datasets.txt dasgoclient -query="dataset=/*/RunIISummer20UL17*NanoAOD*v9-106X*/NANOAODSIM" > inputs/CMS-2017-mc-nano-datasets.txt - [x] an...

Created `create_parent_dicts.py` to initialize NANO-MINI map. Integrated `parent_dicts.py` in `dataset_records.py` and `mcm_store.py`. Modified scripts to allow user to choose number of threads through CLI option `--threads`. Updated `interface.py` to integrate...

To get the information necessary for the reprocessing of the deleted HI collision data: Query for `parent` of: ``` /PPPhoton/Run2013A-PromptReco-v1/RECO /PPMuon/Run2013A-PromptReco-v1/RECO /PPMinBias/Run2013A-PromptReco-v1/RECO /PPJet/Run2013A-PromptReco-v1/RECO /PPFSQ/Run2013A-PromptReco-v1/RECO /PAMuon/HIRun2013-PromptReco-v1/RECO /PAMinBiasUPC/HIRun2013-PromptReco-v1/RECO /PAMinBias2/HIRun2013-PromptReco-v1/RECO /PAMinBias1/HIRun2013-PromptReco-v1/RECO /PAHighPt/HIRun2013-PromptReco-v1/RECO ```...

To get the information necessary for the regeneration and reprocessing of the deleted HI MC: Queried prepids from DAS: McM returns valid: [valid_prepids.txt](https://github.com/user-attachments/files/16248614/valid_prepids.txt) McM returns empty: [lost_prepids.txt](https://github.com/user-attachments/files/16248618/lost_prepids.txt) For lost prepids,...

Adapt utils/update_fixtures_cross_sections.py to be configurable and test it for 2016 datasets. The 2016 cross-section values are available in https://cernbox.cern.ch/files/link/public/EHpyrdJet939vGy and the records are temporarily in `eos ls /eos/opendata/cms/upload/kati` There will...

There are directories like: /eos/opendata/cms/dataset-semantics/NanoAODSIM/ with an entry per run. Currently, there are around 21k entries. It would be nice to have these directories spread a bit more, instead of...

Note that this is merging on a development branch!

Observed when running ` code/lhe_generators.py` with the updates ### xz: File too large It produces some LOG.txt files with a dataset name path (instead of the usual `recid`) to the...

To get the generator parameters for all cases in `/cms-2016-simulated-datasets/code/lhe_generators.py` The datasets for which the parameter files were not found have only `LOG.txt runcmsgrid.sh` in `lhe_generators/2016-sim/gridpacks/` - `gridpack_case == "jhugen":`...

This can be useful for checking the mcdb cases in https://github.com/cernopendata/data-curation/pull/207#issuecomment-2122458004: mcdb_ids taken from mcm_store recids from recid_info.py [ds-recid-mcdb.txt](https://github.com/user-attachments/files/15525423/ds-recid-mcdb.txt)