[RDF] Add `Cov` method to compute covariance matrices
This Pull request:
Changes or fixes:
Adds a Cov method to compute covariance matrices, as the title says :)
Checklist:
- [x] tested changes locally
- [ ] updated the docs (if necessary)
More context:
:rotating_light: WARNING :rotating_light:: this was completely coded by copilot, no thinking was involved!
I intended this initially as a joke, but the code actually compiles, works*, looks mostly reasonable and I am missing this feature. So I would be willing to give it some finishing touches if the contribution is wanted. :)
Suspected shortcomings:
- It is not yet a mergeable value (does it need to be?)
- It looks like it does not support variations? (that commented out part in
CovHelper::MakeNew, but it is the same for e.g.StdDev) - The number of tests is a bit overkill but they do look correct
- Probably needs some more editing of the documentation (does it show up in the RDF Cheatsheet automatically?)
- The commit history is very ugly :D
More details and all the AI-generated description can be found in https://github.com/Zehvogel/root/pull/1
(*) I only tested this with (I don't know how to run all the tests):
import ROOT
import numpy as np
ROOT.EnableImplicitMT(2)
x = np.array([1, 2, 3, 4, 5], dtype=np.float64)
y = np.array([2, 4, 6, 8, 10], dtype=np.float64)
df = ROOT.RDF.FromNumpy({'x': x, 'y': y})
df.Cov(["x", "y"]).GetValue().Print()
a = np.asarray([x, y])
print(np.cov(a))
Hi @Zehvogel ,
Thanks for creating this PR, it's definitely an interesting exercise!
Since the feature request has never surfaced before, we would need a clear explanation of the use cases this would be useful for.
In general, I believe it would be easier for now to provide this functionality as a plug-in to RDF, e.g. some tutorial showing the implementation of a custom action with Book that fills a TMatrixDSym data members and then shows the covariance matrix, maybe even with a plot. I believe we could discuss merging that tutorial in the repository even without it being requested by a larger group of users.
This could actually be useful for usecases like:
- https://github.com/root-project/root/issues/20557
where the covariance matrix is an important summary statistics to augment the histograms in the statistical analysis.
@Zehvogel, if I remember correctly, this is also your usecase? Right?
Thanks for reminding me of this PR...
@Zehvogel, if I remember correctly, this is also your usecase? Right?
Yes, doing something like that was my intention initially. My actual analysis at the moment is slightly different, but it needs mostly the same bits and pieces.
I perform an analysis using so-called optimal observables. It is a matrix element based approach where one calculates one value per EFT parameter and event. Then one needs the means and the covariance matrix C of these observables over all events (and the means also for re-weighted events to build templates). The vector of means will then follow a multivariate normal distribution with covariance matrix C/n (mdim CLT) and one can use the RooMultiVarGaussian to perform a template fit of the EFT parameters. The method is not particularly new and was already used at LEP (see e.g. 6.2.1 in https://inspirehep.net/literature/555574, with the notational difference of minimising a chi2 instead) but is back in fashion for future collider studies since https://inspirehep.net/literature/1742980.
@guitargeek the corresponding RooFit code is here if you want to take a look: https://github.com/Zehvogel/TGC2/blob/main/standalone-fit.ipynb