data-curation icon indicating copy to clipboard operation
data-curation copied to clipboard

Data ingestion and curation tools

=============== data-curation

.. image:: https://github.com/cernopendata/data-curation/workflows/CI/badge.svg :target: https://github.com/cernopendata/data-curation/actions

.. image:: https://badges.gitter.im/Join%20Chat.svg :target: https://gitter.im/cernopendata/opendata.cern.ch?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge

.. image:: https://img.shields.io/badge/licence-GPL_2-green.svg?style=flat :target: https://raw.githubusercontent.com/cernopendata/data-curation/master/LICENSE

About

This repository contains a collection of data ingestion and curation tools used to prepare the datasets, software and any accompanying material for public open data releases on the CERN Open Data <http://opendata.cern.ch/>_ portal.

Generic utilities

  • utils <utils>_ - various helper scripts

Specific utilities

Specific data ingestion and curation campaigns:

  • atlas-2016-masterclasses <atlas-2016-masterclasses>_ -- helper scripts for the ATLAS 2016 masterclasses release
  • atlas-2016-outreach <atlas-2016-outreach>_ -- helper scripts for the ATLAS 2016 outreach release
  • cms-2010-collision-datasets <cms-2010-collision-datasets>_ -- helper scripts for the CMS 2010 open data release (collision datasets)
  • cms-2010-simulated-datasets <cms-2010-simulated-datasets>_ -- helper scripts for the CMS 2010 open data release (simulated datasets)
  • cms-2011-collision-datasets <cms-2011-collision-datasets>_ -- helper scripts for the CMS 2011 open data release (collision datasets)
  • cms-2011-collision-datasets-runb-update <cms-2011-collision-datasets-runb-update>_ -- helper scripts for the CMS 2011 RunB open data release (collision datasets)
  • cms-2011-hlt-triggers <cms-2011-hlt-triggers>_ -- helper scripts for the CMS 2011 open data release (HLT triggers)
  • cms-2011-l1-triggers <cms-2011-l1-triggers>_ -- helper scripts for the CMS 2011 open data release (L1 triggers)
  • cms-2011-simulated-datasets <cms-2011-simulated-datasets>_ -- helper scripts for the CMS 2011 open data release (simulated datasets)
  • cms-2012-collision-datasets <cms-2012-collision-datasets>_ -- helper scripts for the CMS 2012 RunB RunC open data release (collision datasets)
  • cms-2012-collision-datasets-update <cms-2012-collision-datasets-update>_ -- helper scripts for the CMS 2012 RunA RunD open data release (collision datasets)
  • cms-2012-event-display-files <cms-2012-event-display-files>_ -- helper scripts for the CMS 2012 open data release (event display files)
  • cms-2012-simulated-datasets <cms-2012-simulated-datasets>_ -- helper scripts for the CMS 2012 open data release (simulated datasets)
  • cms-2013-collision-datasets-hi <cms-2013-collision-datasets-hi>_ - helper scripts for CMS 2013 heavy ion release (lead collision datasets)
  • cms-2013-collision-datasets-hi-ppref <cms-2013-collision-datasets-hi-ppref>_ - helper scripts for CMS 2013 heavy ion release (proton-proton reference collision datasets)
  • cms-2013-hlt-triggers <cms-2013-hlt-triggers>_ - helper scripts for CMS 2013 trigger information
  • cms-2013-simulated-datasets-hi <cms-2013-simulated-datasets-hi>_ -- helper scripts for the CMS 2013 HI open data release (simulated datasets)
  • cms-2015-collision-datasets <cms-2015-collision-datasets>_ -- helper scripts for the CMS 2015 open data release (collision datasets)
  • cms-2015-collision-datasets-hi-ppref <cms-2015-collision-datasets-hi-ppref>_ - helper scripts for CMS 2015 heavy ion release (proton-proton reference collision datasets)
  • cms-2015-simulated-datasets <cms-2015-simulated-datasets>_ -- helper scripts for the CMS 2015 open data release (simulated datasets)
  • cms-2016-collision-datasets <cms-2016-collision-datasets>_ -- helper scripts for the CMS 2016 open data release (collision datasets)
  • cms-2016-pileup-dataset <cms-2016-pileup-dataset>_ -- helper scripts for the CMS 2016 open data release (pileup dataset)
  • cms-2016-simulated-datasets <cms-2016-simulated-datasets>_ -- helper scripts for the CMS 2016 open data release (simulated datasets)
  • cms-YYYY-luminosity <cms-YYYY-luminosity>_ -- helper scripts for the CMS luminosity information records (any year)
  • cms-YYYY-run-numbers <cms-YYYY-run-numbers>_ -- helper scripts for enriching CMS dataset run numbers (any year)
  • cms-YYYY-simulated-datasets <cms-YYYY-simulated-datasets>_ -- helper scripts for CMS simulated dataset records (any year)
  • cms-YYYY-validated-runs <cms-YYYY-validated-runs>_ -- helper scripts for the CMS validated runs records (any year)
  • cms-derived-data <cms-derived-data>_ -- helper scripts for the CMS derived datasets (NanoAODRun1, PFNano, POET)
  • cms-release-info <cms-release-info>_ -- CMS year-specific and run-era-specific information
  • cms-run2-hlt-triggers <cms-run2-hlt-triggers>_ -- helper scripts for the CMS Run2 data release (HLT triggers)
  • cms-run2-ultra-legacy-production <cms-run2-ultra-legacy-production>_ - helper scripts for CMS Run2 ultra-legacy production
  • cod2-to-cod3 <cod2-to-cod3>_ - record migration from version 2 to version 3
  • jade-2023-raw-datasets <jade-2023-raw-datasets>_ - helper scripts for the initial release of JADE data
  • opera-2017-multiplicity-studies <opera-2017-multiplicity-studies>_ - helper scripts for the release of OPERA multiplicity studies
  • opera-2019-electron-neutrinos <opera-2019-electron-neutrinos>_ - helper scripts for the release of OPERA electron neutrino events
  • opera-2019-neutrino-induced-charm <opera-2019-neutrino-induced-charm>_ - helper scripts for the release of OPERA charm events

Related links

See also:

  • CERN Open Data <http://opendata.cern.ch>_ portal
  • its source code <https://github.com/cernopendata/opendata.cern.ch>_
  • its record fixtures <https://github.com/cernopendata/opendata.cern.ch/tree/master/cernopendata/modules/fixtures/data/records>_

Contributors

The list of contributors in alphabetical order:

  • Anna Trzcinska <https://github.com/annatrz>_
  • Artemis Lavasa <https://orcid.org/0000-0001-5633-2459>_
  • Heitor de Bittencourt <https://linkedin.com/in/heitorpb>_
  • Julie Hogan <https://orcid.org/0000-0002-8604-3452>_
  • Kati Lassila-Perini <https://orcid.org/0000-0002-5502-1795>_
  • Mantas Savaniakas <https://github.com/mantasavas>_
  • Miko Piitsalo <https://github.com/mokotus>_
  • Osama Sh. Almomani <https://github.com/OsamaMomani>_
  • Tibor Šimko <https://orcid.org/0000-0001-7202-5803>_