opendata.cern.ch icon indicating copy to clipboard operation
opendata.cern.ch copied to clipboard

2012 datasets - initial considerations

Open katilp opened this issue 9 years ago • 9 comments

Notes for CMS 2012 data:

Collision data: dataset=/*/*2012*22Jan2013*/AOD (total of 1.1 PB out of which 510 TB Parked - see https://cds.cern.ch/record/1480607/files/DP2012_022.pdf and also https://profmattstrassler.com/articles-and-posts/lhcposts/triggering-advances-in-2012/data-parking-at-cms/ :smiley: ) A: from the start to May 6: 45 TB (including 2 TB Parked) B: from May 12 to June 18: 230 TB (95 TB Parked) C: from July 1 to September 27: 354 TB (155 TB Parked) D: from September 28 to December 5: 477 TB (260 Parked)

NB (dates approximate, some double-counting in data volumes because of special HZZ, TOPElePlusJets and TOPMuPlusJets processings with 42.9 TB)

int_lumi_per_day_cumulative_pp_2012

TBD:

  • [x] Check if it is justified not to include Parked datasets
  • [x] Decide whether release A+B+C (appr 14fb-1) or B+C (appr 13fb-1) or C+D (appr 16 fb-1): all these combinations less than 0.5 PB (without Parked)

MC dataset=/*/*Summer12_DR53X*V7*/AODSIM (total of 2 PB) or dataset=/*/*Summer12_DR53X*V19*/AODSIM (total of 1 PB)

TBD:

  • [x] check with the experts if these are overlapping:

No, the legacy campaign for the 2012 MC (called Summer12DR53X) is the one using the GT: START53_V19 (757.6 TB)

In that campaign there were also some additional special production (mostly run-dependent) using other GTs:

  • START53_V19F: BPH run-dependent DIGI-RECO in 5.3. (162.8 TB)
  • START53_V19E : digi-reco in 5.3 with specific Muon alignement for EXO (1.3 TB)
  • START53_V7N : run dependent for h2gg (254.6 TB)

Open question for V19D (38 TB)

(From Gianluca Germinara and ppd)

  • [x] check if there is any obvious rule to decide what to leave out
  • [x] check with the 2011 "algorithm" how to divide these in categories
  • [x] check with @tiborsimko that all 2011 procedures scale with the increased number of datasets

katilp avatar Sep 05 '16 15:09 katilp

Include Parked data to the release

katilp avatar Sep 07 '16 08:09 katilp

@katilp: I'm probably missing something, but I thought I should confirm anyway that you mean 0.5 TB for the second TBD and not 0.5 PB.

RaoOfPhysics avatar Sep 07 '16 14:09 RaoOfPhysics

@RaoOfPhysics: good point, thanks; I mean 0.5 PB

katilp avatar Sep 07 '16 14:09 katilp

Recapitulating numbers for the resource request:

Data: 2012 data taking was divided in four runs: RunA, RunB, RunC and RunD with a total of 1.1 PB out of which a part will be released in 2017. The released data will be at maximum 831 TB - in case of releasing RunC + RunD - and at minimum 477 TB - if only RunD is released , other combinations are possible, and to be defined considering the best possible software compatibility with the MC samples. In long-term, CMS may consider releasing the full data, if so decided by the collaboration board, but this will not happen in 2017.

MC: The legacy campaign for the 2012 MC (called Summer12DR53X) is the one using the GT: START53_V19 and sums up to 757.6 TB. In addition, some special productions should be kept (run dependent production for B physics with 162.8 TB, special alignment for muons with 1.3 TB and Higgs to gg with 254.6 TB), summing up to a total of 1.2 PB.

CMS therefore needs for the long-term preservation and open access of these samples 2 PB of disk space in eospublic at CERN, to be served through xrootd (or direct download) from opendata.cern.ch.

katilp avatar Sep 12 '16 13:09 katilp

Storage space for 2 PB OK for 2017, so we can proceed 😃

katilp avatar Sep 23 '16 13:09 katilp

Listing of https://cmsweb.cern.ch/das/request?view=plain&limit=3000&instance=prod%2Fglobal&input=dataset%3D%2F*%2FSummer12_DR53XV19*%2FAODSIM

Summer12_DR53X-PU_S10_START52_V19_listing.pdf

katilp avatar Sep 28 '16 11:09 katilp

For the Run periods to be released, a good choice could be RunB + RunC. It would amount to 584 TB, and include most of the data that were included in the Higgs discovery analysis. To be discussed with the CMS physics coordination.

Update: From Higgs -> 4l point of view, no contradiction for RunB + RunC. RunD has some more pile-up, but would still be good as well. (from Andre Mendes)

https://cmsweb.cern.ch/das/request?view=list&instance=prod%2Fglobal&input=dataset%3D%2F*%2F2012B22Jan2013*%2FAOD+ 31 datasets Run2012B-22Jan2013_listing.pdf

https://cmsweb.cern.ch/das/request?view=list&instance=prod%2Fglobal&input=dataset%3D%2F*%2F2012C22Jan2013*%2FAOD+ 39 datasets Run2012C-22Jan2013_listing.pdf

NB the list of triggers in each dataset in https://fwyzard.web.cern.ch/fwyzard/hlt/2012/dataset

Confirmed that they are all proper datasets for physics, information also from http://inspirehep.net/record/1467921/files/10.1016_j.nuclphysbps.2015.09.144.pdf (Dataset definition for CMS operations and physics analyses)

In particular from RunB and RunC:

  • HTMHTParked (Parked dataset motivated by Susy hadronic searches)
  • HcalNZS (technical trigger with HLT_HcalNZS, HLT_HcalPhiSym, HLT_HcalUTCA)
  • NoBPTX (technical trigger)
  • VBF1Parked (motivated by Vector Boson Fusion)

In addition from RunC:

  • LP_ZeroBias
  • LP_ExclEGMU
  • LP_Jets1
  • LP_Jets2
  • LP_MinBias1
  • LP_MinBias2
  • LP_MinBias3
  • LP_RomanPots

-> all LP_ datasets have only 5 runs 198899-199903, https://twiki.cern.ch/twiki/bin/view/CMS/CertificationCollisions12 indicates that they are Totem runs: "LHC fill: 2836 Runs: 198899 198900 198901 198902 198903 Comment: Totem run - Special trigger menu - Run 198898 is flagged as cosmic but it is a collision run"

Update Nov 4 2016: The LP_ datasets are the CMS part of the common CMS-TOTEM runs. The data can be analysed only when combined with separate TOTEM data. Acoording to the FSQ PAG conveners, we can leave them out of the release.

katilp avatar Oct 12 '16 10:10 katilp

The release date will be decided together with the CMS physics coordination making sure that all CMS 8 TeV key analyses will have been published before.

katilp avatar Oct 12 '16 10:10 katilp

NB: no DoubleMu in 22Jan2013 reprocessing, checking this with Higgs POG, Muon PAG and PPD. Earlier reprocessing available: RunB: https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=dataset%3D%2FDoubleMu%2F2012B%2FAOD RunC: https://cmsweb.cern.ch/das/request?view=list&limit=50&instance=prod%2Fglobal&input=dataset%3D%2FDoubleMu%2F2012C%2FAOD

Explanation: No need for DoubeleMu as DoubleMuParked contains DoubleMu

katilp avatar Jun 15 '17 06:06 katilp