root icon indicating copy to clipboard operation
root copied to clipboard

[RDF] Add `Cov` method to compute covariance matrices

Open Zehvogel opened this issue 4 months ago • 4 comments

This Pull request:

Changes or fixes:

Adds a Cov method to compute covariance matrices, as the title says :)

Checklist:

  • [x] tested changes locally
  • [ ] updated the docs (if necessary)

More context:

:rotating_light: WARNING :rotating_light:: this was completely coded by copilot, no thinking was involved!

I intended this initially as a joke, but the code actually compiles, works*, looks mostly reasonable and I am missing this feature. So I would be willing to give it some finishing touches if the contribution is wanted. :)

Suspected shortcomings:

  • It is not yet a mergeable value (does it need to be?)
  • It looks like it does not support variations? (that commented out part in CovHelper::MakeNew, but it is the same for e.g. StdDev)
  • The number of tests is a bit overkill but they do look correct
  • Probably needs some more editing of the documentation (does it show up in the RDF Cheatsheet automatically?)
  • The commit history is very ugly :D

More details and all the AI-generated description can be found in https://github.com/Zehvogel/root/pull/1

(*) I only tested this with (I don't know how to run all the tests):

import ROOT
import numpy as np

ROOT.EnableImplicitMT(2)

x = np.array([1, 2, 3, 4, 5], dtype=np.float64)
y = np.array([2, 4, 6, 8, 10], dtype=np.float64)

df = ROOT.RDF.FromNumpy({'x': x, 'y': y})

df.Cov(["x", "y"]).GetValue().Print()

a = np.asarray([x, y])
print(np.cov(a))

Zehvogel avatar Aug 21 '25 08:08 Zehvogel

Hi @Zehvogel ,

Thanks for creating this PR, it's definitely an interesting exercise!

Since the feature request has never surfaced before, we would need a clear explanation of the use cases this would be useful for.

In general, I believe it would be easier for now to provide this functionality as a plug-in to RDF, e.g. some tutorial showing the implementation of a custom action with Book that fills a TMatrixDSym data members and then shows the covariance matrix, maybe even with a plot. I believe we could discuss merging that tutorial in the repository even without it being requested by a larger group of users.

vepadulano avatar Oct 20 '25 20:10 vepadulano

This could actually be useful for usecases like:

  • https://github.com/root-project/root/issues/20557

where the covariance matrix is an important summary statistics to augment the histograms in the statistical analysis.

@Zehvogel, if I remember correctly, this is also your usecase? Right?

guitargeek avatar Dec 04 '25 09:12 guitargeek

Thanks for reminding me of this PR...

@Zehvogel, if I remember correctly, this is also your usecase? Right?

Yes, doing something like that was my intention initially. My actual analysis at the moment is slightly different, but it needs mostly the same bits and pieces.

I perform an analysis using so-called optimal observables. It is a matrix element based approach where one calculates one value per EFT parameter and event. Then one needs the means and the covariance matrix C of these observables over all events (and the means also for re-weighted events to build templates). The vector of means will then follow a multivariate normal distribution with covariance matrix C/n (mdim CLT) and one can use the RooMultiVarGaussian to perform a template fit of the EFT parameters. The method is not particularly new and was already used at LEP (see e.g. 6.2.1 in https://inspirehep.net/literature/555574, with the notational difference of minimising a chi2 instead) but is back in fashion for future collider studies since https://inspirehep.net/literature/1742980.

Zehvogel avatar Dec 04 '25 14:12 Zehvogel

@guitargeek the corresponding RooFit code is here if you want to take a look: https://github.com/Zehvogel/TGC2/blob/main/standalone-fit.ipynb

Zehvogel avatar Dec 04 '25 14:12 Zehvogel