madanalysis5 icon indicating copy to clipboard operation
madanalysis5 copied to clipboard

Jet Substructure

Open jackaraz opened this issue 2 years ago • 13 comments

Context

This branch is dedicated to implementing substructure tools to the MadAnalysis framework.

Description of the change and benefits

Interface updates

  • MultiJet clustering : madanalysis/jet_clustering/.

jet_algorithm: This module is separate from the original jet clustering interface within Ma5. This module can be activated using the following command

ma5> define jet_algorithm my_jet antikt radius=0.5

where my_jet is a user-defined jet identifier, antikt is the algorithm to be used which can be chosen from antikt, cambridge, genkt, kt, gridjet, cdfjetclu, cdfmidpoint, and siscone. The rest of the arguments are optional if the user won't define the radius, ptmin etc. default parameters will be chosen. Each algorithm has its own unique set of parameters i.e.

Algorithm Parameters & Default values
antikt, cambridge radius=0.4, ptmin=5.
genkt radius=0.4, ptmin=5., exclusive=False, p=-1
kt radius=0.4, ptmin=5., exclusive=False
gridjet ymax=3., ptmin=5.
cdfjetclu radius=0.4, ptmin=5., overlap=0.5, seed=1., iratch=0.
cdfmidpoint radius=0.4, ptmin=5., overlap=0.5, seed=1., iratch=0., areafraction=1.
siscone radius=0.4, ptmin=5., overlap=0.5, input_ptmin=5., npassmax=1.
VariableR rho=2000., minR=0., maxR=2., ptmin=20. exclusive=False clustertype=CALIKE strategy=Best

It is also possible to modify the entry after defining it

ma5> define jet_algorithm my_jet cambridge
ma5> set my_jet.ptmin = 200.
ma5> set my_jet.radius = 0.8

Note that when a jet_algorithm is defined MadAnalysis interface will automatically switch to constituent smearing mode. set my_jet.+<tab> will show the dedicated options available for that particular algorithm.

It is possible to display all the jets available in the current session by using display jet_algorithm command:

$ ./bin/ma5 -R
ma5>set main.fastsim.package = fastjet
ma5>define jet_algorithm my_jet cdfmidpoint
ma5>display jet_algorithm
MA5: * Primary Jet Definition :
MA5:  fast-simulation package : fastjet
MA5:  clustering algorithm : antikt
MA5:   + Jet ID : Ma5Jet
MA5:   + cone radius = 0.4
MA5:   + PT min (GeV) for produced jets = 5.0
MA5:   + exclusive identification = true
MA5:   + b-jet identification:
MA5:     + DeltaR matching = 0.5
MA5:     + exclusive algo = true
MA5:     + id efficiency = 1.0
MA5:     + mis-id efficiency (c-quark)      = 0.0
MA5:     + mis-id efficiency (light quarks) = 0.0
MA5:   + hadronic-tau identification:
MA5:     + id efficiency = 1.0
MA5:     + mis-id efficiency (light quarks) = 0.0
MA5:    --------------------
MA5: * Other Jet Definitions:
MA5:    1. Jet ID = my_jet
MA5:       - algorithm       : cdfmidpoint
MA5:       - radius          : 0.4
MA5:       - ptmin           : 5.0
MA5:       - overlap         : 0.5
MA5:       - seed            : 1.0
MA5:       - iratch          : 0.0
MA5:       - areafraction    : 1.0

Here primary jet is defined with the original jet definition syntax of MadAnalysis 5 where since we did not specify anything, it uses default antikt configuration. For more info on how to define primary jet see arXiv:2006.09387. Other jet definitions show all the jets which are defined via jet_algorithm keyword.

To remove a jet_algorithm definition one can use remove my_jet command. Note that one can also change the name of the primary jet which is Ma5Jet by default.

ma5>set main.fastsim.JetID = my_primary_jet

Link to SFS: There can only be one jet smearing/tagging definition where in case of existing multi-jet definitions smearing will be applied in constituent level which is used by all jets defined in the framework. Jet tagging is only available for the primary jet.

Updates in expert mode structure

Expert mode is designed to automatically realize and construct the MadAnalysis framework according to the given .ma5 command script. This command script can include all the SFS construction mentioned in arXiv:2006.09387 and it can also include multijet definitions

sfs_card_cms_exo_xx_yy.ma5:

set main.fastsim.package = fastjet

# Define multijet
define jet_algorithm AK08 antikt
set AK08.radius = 0.8
set AK08.ptmin = 200
define jet_algorithm CA15 cambridge radius=1.5 ptmin=200.0

# Define Jet reconstruction efficiencies
define reco_efficiency j 0.925 [abseta <= 1.5]
define reco_efficiency j 0.875 [abseta > 1.5 and abseta <= 2.5]
define reco_efficiency j 0.80  [abseta > 2.5]

# Define Jet smearer
define smearer j with PT sqrt(0.06^2 + pt^2*1.3e-3^2) [abseta <= 0.5 and pt > 0.1]
define smearer j with PT sqrt(0.10^2 + pt^2*1.7e-3^2) [abseta > 0.5 and abseta <= 1.5 and pt > 0.1]
define smearer j with PT sqrt(0.25^2 + pt^2*3.1e-3^2) [abseta > 1.5 and abseta <= 2.5 and pt > 0.1]

# Define B-tagging
define tagger j as b 0.01+0.000038*pt
define tagger c as b 0.25*tanh(0.018*pt)/(1.0+ 0.0013*pt)     [abseta < 2.5]
define tagger c as b 0.0                                      [abseta >=2.5]
define tagger b as b 0.85*tanh(0.0025*pt)*(25.0/(1+0.063*pt)) [abseta < 2.5]
define tagger b as b 0.0                                      [abseta >= 2.5]

sfs_card_cms_exo_xx_yy.ma5 shows a simple example of multi-jet clustering and detector simulation implementation. First, it chooses the FastJet package as fastsim interpreter, then defines multijet and in the following defines a simple detector simulation. This file can be executed as follows;

$ ./bin/ma5 -Re cms_exo_xx_yy cms_exo_xx_yy sfs_card_cms_exo_xx_yy.ma5

here cms_exo_xx_yy is a given analysis and sfs_card_cms_exo_xx_yy.ma5 holds the information for the detector simulation (PAD requires sfs_card_cms_exo_xx_yy.ma5 to setup detector simulation for the analysis code, it automatically writes the detector simulation according to the given sfs_card file. Note that if there is a card with the same name for multiple analysis files, those analyses can be executed at the same time, hence allowing more efficient execution.). This command will write a folder called cms_exo_xx_yy including all MadAnalysis 5 framework within.

  • Multijet definitions: Multijets are automatically defined in cms_exo_xx_yy/Build/Main/main.cpp (do not change) as follows;
  //Adding new jet with ID AK08
  std::map<std::string, std::string> JetConfiguration1;
  JetConfiguration1["JetID"            ] = "AK08";
  JetConfiguration1["algorithm"        ] = "antikt";
  JetConfiguration1["cluster.R"        ] = "0.8";
  JetConfiguration1["cluster.PTmin"    ] = "200.0";
  cluster1->LoadJetConfiguration(JetConfiguration1);

  //Adding new jet with ID CA15
  std::map<std::string, std::string> JetConfiguration2;
  JetConfiguration2["JetID"            ] = "CA15";
  JetConfiguration2["algorithm"        ] = "cambridge";
  JetConfiguration2["cluster.R"        ] = "1.5";
  JetConfiguration2["cluster.PTmin"    ] = "200.0";
  cluster1->LoadJetConfiguration(JetConfiguration2); 

all these inputs are interpreted by JetClusterer machinery within MadAnalysis 5. Additionally Makefile has been modified via CXXFLAGS += -DMA5_FASTJET_MODE flag to enable FastJet use within MadAnalysis data structure, without this flag fastjet dependent accessors will not work.

  • Analysis folder: cms_exo_xx_yy/Build/SampleAnalyzer/User/Analyzer This folder includes all the definitions written for the detector simulation;
$ ls
analysisList.h        cms_exo_xx_yy.h       new_smearer_reco.cpp  new_tagger.cpp        reco.h
cms_exo_xx_yy.cpp     efficiencies.h        new_smearer_reco.h    new_tagger.h          sigmas.h

cms_exo_xx_yy.* are the analysis files and the rest are detector simulation modules (Please do not change those files when the analysis submitted in PAD only cms_exo_xx_yy.*, cms_exo_xx_yy.info and sfs_card_cms_exo_xx_yy.ma5 files are allowed. If you need to modify detector simulation, please modify sfs_card_cms_exo_xx_yy.ma5 and re-execute the workspace or include your personal modifications in cms_exo_xx_yy.cpp).

  • Multijet accessor: Each multijet is accessible within c++ interface through their unique identifiers. Primary jet is a special case where one can use event.rec()->jet() accessor to see all primary jets. Rest of the jets can be found using event.rec()->jet("AK08") or event.rec()->jet("CA15") respectively.

Update on expert mode compilation

In order to separate the execution method for Delphes and SFS-FastJet based analyses, a command line option has been added to the setup.sh. In order to enable SFS-FastJet mode use following commands;

source setup.sh --with-fastjet
make clean all

or simply type source setup.sh -h for details.

New shortcuts

RecJet  := MA5::RecJetFormat *;
RecJets := std::vector<const Jet>;
RecTau := MA5::RecTauFormat *;
RecTaus := std::vector<const Tau>;
RecLepton  := MA5::RecLeptonFormat *;
RecLeptons := std::vector<const Lepton>;
RecPhoton  := MA5::RecPhotonFormat *;
RecPhotons := std::vector<const Photon>;

FastJet Contrib toolset

SoftDrop

MAfloat32 z_cut = 0.10;
MAfloat32 beta  = 2.0;
Substructure::SoftDrop softDrop(beta, z_cut);

RecJets filtered_jets = filter(event.rec()->jets(), 20., 2.5);

// Only leading jet:
if (filtered_jets.size() > 0)
    RecJet softdrop_jet = softDrop.Execute(Jets[0]);

// All jets:
RecJets softdrop_jets = softDrop.Execute(Jets);

Cluster

Execute with reconstructed event sample

Substructure::Cluster cluster(Substructure::antikt, 0.4, 20);
cluster.Execute(event, "AK4");

if (event.rec()->jets("AK4").size() > 0)
    INFO << "AK4 Jet PT = " <<  event.rec()->jets("AK4")[0].pt() << endmsg;

Other execution modes:

MAfloat32 R = 0.5;
MAfloat32 pitmin = 20.; // optional
Substructure::Cluster cluster(Substructure::cambridge, R, ptmin);

// algorithm options antikt, cambridge, kt

RecJets Jets = filter(event.rec()->jets(), 20., 2.5);

// Only leading jet:
if (Jets.size() > 0)
    RecJets jet = cluster.Execute(Jets[0]);

// All jets:
RecJets jets = cluster.Execute(Jets);

Filter clustered object

Substructure::Cluster cluster(Substructure::cambridge, 0.2);
RecJets Jets = filter(event.rec()->jets("AK8"), 20., 2.5);

if (Jets.size() > 0) {
    RecJets FilteredSubjets = cluster.Execute(
        Jets[0], [](const RecJet jet, const RecJet subjet) 
                       { return subject->pt() > jet->pt() * 0.05 ; }
    );
    for (auto &jet: FilteredSubjets) INFO << Jets[0]->pt()*0.05 << " < " << jet->pt() << endmsg;
}

Recluster

RecJets Jets = filter(event.rec()->jets(), 20., 2.5);

Substructure::Recluster recluster(Substructure::cambridge, 0.5);

// Only leading jet:
if (Jets.size() > 0)
    RecJet jet = recluster.Execute(Jets[0]); // only gives the leading jet

// All jets:
RecJets jets = recluster.Execute(Jets); // only gives the leading jet for each reclustered jet

Nsubjettiness

RecJets Jets = filter(event.rec()->jets(), 20., 2.5);
Substructure::Nsubjettiness nsubjettiness(
    1,  // order
    Substructure::Nsubjettiness::KT_Axes, // axes definition
    Substructure::Nsubjettiness::UnnormalizedMeasure,  // measure definition
    1., // beta
    -1., // R0 
     std::numeric_limits<double>::max() // Rcutoff
);

if (Jets.size() > 0)
        MAdouble64 tau1 = nsubjettiness.Execute(Jets[0]);

Available axes definitions:

KT_Axes,
CA_Axes,
AntiKT_Axes,     
WTA_KT_Axes,
WTA_CA_Axes,
Manual_Axes,
OnePass_KT_Axes,
OnePass_CA_Axes,
OnePass_AntiKT_Axes,   
OnePass_WTA_KT_Axes,
OnePass_WTA_CA_Axes,

Available measure definitions

NormalizedMeasure,            // (beta,R0)
UnnormalizedMeasure,          // (beta)
NormalizedCutoffMeasure,      // (beta,R0,Rcutoff)
UnnormalizedCutoffMeasure,    // (beta,Rcutoff)

VariableR Plugin

Initialize through normal mode define jet_algorithm my_varR VariableR rho=2000 minR=0 maxR=2 ptmin=15 exclusive=False clustertype=CALIKE strategy=Best

Initialisation through analysis:

  MAfloat32 rho = 2000.;
  MAfloat32 minR = 0.;
  MAfloat32 maxR = 2.;
  Substructure::ClusterType clusterType = Substructure::VariableR::AKTLIKE; // CALIKE, KTLIKE, AKTLIKE
  Substructure::Strategy strategy = Substructure::VariableR::Best; // Best, N2Tiled, N2Plain, NNH, Native
  MAfloat32 ptmin = 0.;
  MAbool isExclusive = false;
  Substructure::VariableR variableR(
          rho, minR, maxR, clusterType, strategy, ptmin, isExclusive
  );

OR

Substructure::VariableR variableR;
variableR.Initialize(rho, minR, maxR, clusterType, strategy, ptmin, isExclusive);

Execution for the reco event:

std::string JetID = "VarR"
variableR.Execute(event, JetID);

if (event.rec()->jets("VarR").size() > 0)
    INFO << "VarR Jet PT = " <<  event.rec()->jets("VarR")[0].pt() << endmsg;

Execution with reconstructed jets:

RecJets Jets = filter(event.rec()->jets(), 20.);
RecJets VarRJets = variableR.Execute(Jets);
if (VarRJets.size() > 0)
    INFO << "VarR Jet PT = " <<  VarRJets[0]->pt() << endmsg;

Execution with a single jet:

RecJets Jets = filter(event.rec()->jets(), 20.);
RecJets VarRJets = variableR.Execute(Jets[0]);
if (VarRJets.size() > 0)
    INFO << "VarR Jet PT = " <<  VarRJets[0]->pt() << endmsg;

Filter clustered object

RecJets Jets = filter(event.rec()->jets("AK8"), 20., 2.5);

if (Jets.size() > 0) {
    RecJets FilteredSubjets = variableR.Execute(
        Jets[0], [](const RecJet jet, const RecJet subjet){ return subject->pt() > jet->pt() * 0.05 ; }
    );
    for (auto &jet: FilteredSubjets) INFO << Jets[0]->pt()*0.05 << " < " << jet->pt() << endmsg;
}

Pruner

execute with a single jet

Substructure::Pruner pruner(Substructure::antikt, 0.1, 1.);
vector<const RecJetFormat*> Jets = filter(event.rec()->jets(), 20.);
const RecJetFormat* PrunedJet = pruner.Execute(Jets[0]);
INFO << "prunned Jet PT = " <<  PrunedJet->pt() <<  endmsg;

vector execution

    Substructure::Pruner pruner(Substructure::antikt, 0.1, 1.);
    vector<const RecJetFormat*> Jets = filter(event.rec()->jets(), 20.);
    vector<const RecJetFormat*> PrunedJets = pruner.Execute(Jets);
    for (auto &jet: PrunedJets)
        INFO << "prunned Jet PT = " <<  jet->pt() <<  endmsg;

Jet Filtering

std::vector<const RecJetFormat*> Jets = filter(event.rec()->jets(), 20., 2.5);

Substructure::Selector selector = Substructure::SelectorPtFractionMin(0.03);
MAfloat32 Rfilt = 0.2;
Substructure::Filter jetFilter(Rfilt, selector);
// Execute for a single jet
if (Jets.size() > 0) {
        const RecJetFormat *filtjet = jetFilter.Execute(Jets[0]); // only gives the leading jet
        INFO << filtjet->pt() << " " << Jets[0]->pt() << endmsg;
    }

// execute as a vector
std::vector<const RecJetFormat *> filtjets = jetFilter.Execute(Jets);

Energy Correlator

    std::vector<const RecJetFormat*> Jets = filter(event.rec()->jets(), 20., 2.5);

MAuint32 N = 1;
MAfloat32 beta = 0.1;
EnergyCorrelator::Measure measure = EnergyCorrelator::Measure::pt_R;
EnergyCorrelator::Strategy strategy = EnergyCorrelator::Strategy::storage_array;
Substructure::EnergyCorrelator EC(N,beta,measure,strategy);
INFO << EC.Execute(Jets[0]) << endmsg;

Tests were done for backwards compatibility

Not tested.

TODO

  • [ ] Remove accesibility of FastJet through analysis.
  • [x] Add fj-contrib wrappers.

Drawbacks: Drawbacks have been elevated by splitting delphes and sfs based execution modes. This hasn't been extensively tested yet but should solve the problem.

jackaraz avatar Jan 19 '22 09:01 jackaraz