data-curation icon indicating copy to clipboard operation
data-curation copied to clipboard

CMS - debug cross-section utility script for case HiggsPhysics/StandardModelPhysics

Open katilp opened this issue 10 months ago • 4 comments

The cross-section utility does not seem to work for HiggsPhysics as under MC2015/HiggsPhysics in https://cernbox.cern.ch/files/link/public/EHpyrdJet939vGy

It worked without problem for the StandardModelPhysics cases but for

    "categories": {
      "primary": "Higgs Physics",
      "secondary": [
        "Standard Model"
      ],

running the script in a local data-curation area with cross sections downloaded to a local MC2015

$ ./utils/update_fixtures_cross_sections.py    -c ./MC2015/HiggsPhysics -i ../opendata.cern.ch/cernopendata/modules/fixtures/data/records  -o ../opendata.cern.ch/cernopendata/modules/fixtures/data/records
Processing cms-simulated-datasets-2015-part_01...
Processing cms-simulated-datasets-2015-part_02...
Processing cms-simulated-datasets-2015-part_03...
Processing cms-simulated-datasets-2015-part_04...
Processing cms-simulated-datasets-2015-part_05...
Processing cms-simulated-datasets-2015-part_06...
Processing cms-simulated-datasets-2015-part_07...
Processing cms-simulated-datasets-2015-part_08...
Processing cms-simulated-datasets-2015-part_09...
Processing cms-simulated-datasets-2015-part_10...
Processing cms-simulated-datasets-2015-part_11...
Processing cms-simulated-datasets-2015-part_12...
Processing cms-simulated-datasets-2015-part_13...
Processing cms-simulated-datasets-2015-part_14...
Processing cms-simulated-datasets-2015-part_15...
Processing cms-simulated-datasets-2015-pileup...
Total number of cross-section values json files: 789, Total number of amended datasets: 0
Total number of datasets amended using Format 1: 0
Total number of datasets amended using Format 2: 0
Total number of datasets amended using Format 3: 0
Total number of datasets amended using Format 4: 0
Total number of datasets amended using Format 5: 0
Total number of datasets amended using Format 6: 0

results in overwriting the MC listing with

$ ls MC2015/HiggsPhysics/StandardModel/
'$Name (and location) of the dataset.json'

and no changes are made to the actual records.

The StandardModelPhysics processes were already added before the release, see https://github.com/cernopendata/opendata.cern.ch/issues/3454#issuecomment-2017414073

katilp avatar Apr 08 '24 09:04 katilp

There is an additional file in the list:

$ ls MC2015/HiggsPhysics/StandardModel/ | head -5
$Name (and location) of the dataset.json
GluGluHToBB_M120_13TeV_powheg_pythia8_16793.json
GluGluHToBB_M125_13TeV_amcatnloFXFX_pythia8_16794.json
GluGluHToBB_M125_13TeV_amcatnloFXFX_pythia8_16795.json
GluGluHToBB_M125_13TeV_powheg_herwigpp_16796.json

but removing it does not solve the problem

katilp avatar Apr 08 '24 11:04 katilp

Hi Kati, the json files for 2015 Higgs, 2016 SM and 2016 Higgs follow a more neat format that is different from the format of 2015 SM.

The new format has the following fields:

data = {"Dataset":dataset,
            "xsec_before_matching" : "-9",
            "xsec_before_matching_uncertainty"  : "-9",
            "xsec_after_matching"  : "-9",
            "xsec_after_matching_uncertainty" : "-9",
            "xsec_before_filter" : "-9",
            "xsec_before_filter_uncertainty" : "-9",
            'total_value' : "-9",
            'total_value_uncertainty' : "-9",
            'matching_efficiency' : "-9",
            "matching_efficiency_uncertainty" : "-9",
            "HepMC_filter_efficiency" : "-9",
            "HepMC_filter_efficiency_uncertainty" : "-9",
            "HepMC_filter_efficiency_evt" : "-9",
            "HepMC_filter_efficiency_evt_uncertainty" : "-9",
            'filter_efficiency' : "-9",
            "filter_efficiency_uncertainty" : "-9",
            "filter_efficiency_evt" : "-9",
            "filter_efficiency_evt_uncertainty" : "-9",
            'neg_weight_fraction' : "-9",
            "neg_weight_fraction_uncertainty" : "-9",
            "equivalent_lumi" : "-9",
            "equivalent_lumi_uncertainty" : "-9",
        }

If a value does not exist for the sample, then it is filled with "-9". Since all the json files now have the same format, we would no longer need the "if...else"s.

The name of the dataset is now filled in the "Dataset" field of the json file, so this line should be changed to dataset = json_record[1]["Dataset"].

Ari-mu-l avatar Apr 12 '24 21:04 Ari-mu-l

Thanks @Ari-mu-l ! With that change, I get:

  File "./utils/update_fixtures_cross_sections.py", line 196, in main
    record["cross_section"]["total_value"] = cross_sections_json_data[
KeyError: 'totX_final'

so this (and others if needed) should be changed to reflect the new naming.

Would you be able to update them?

katilp avatar Apr 13 '24 08:04 katilp

The change to -9 for non-existing fields requires changes in the record display template will also need to be taken into account in the script. It should not add any -9 fields to the records.

The display template in https://github.com/cernopendata/opendata.cern.ch/blob/master/cernopendata/templates/cernopendata_records_ui/records/record_detail.html#L106 relies on non-existence of certain fields - not on them having value -9.

katilp avatar Apr 13 '24 08:04 katilp