root
root copied to clipboard
[RF] Batch mode with RooSimultaneous introduces spurious parameters
- [x] Checked for duplicates
Describe the bug
When fitting a RooSimultaneous pdf using BatchMode(true) (which should be cpu) the fit contains additional parameters, one for each observable, called _<first category label>_<obs name>
, where the category is the one used in the RooSimultaneous.
The fit converges, but fails at the HESSE step, leading to an "approximation only" covariance matrix.
The label used is, from what I understood, based on label ordering and not the indices.
The mapping from category label to index however influences the fit result.
Expected behavior
No extra parameters, successful HESSE
To Reproduce
void test_batchModeCategory() {
RooRealVar x("x", "", 0, 1);
RooRealVar rnd("rnd", "", 0, 1);
// change this mapping from labels to indices to change the fit result (slightly)
RooThresholdCategory catThr("cat", "", rnd, "a", 2);
catThr.addThreshold(1./3, "b", 0);
catThr.addThreshold(2./3, "c", 1);
RooRealVar m("m", "", 0.5, 0, 1);
RooGaussian g("g", "", x, m, RooFit::RooConst(0.1));
RooUniform rndPdf("rndPdf", "", rnd);
RooProdPdf pdf("pdf", "", RooArgSet(g, rndPdf));
auto ds = pdf.generate(RooArgSet(x, rnd), RooFit::Name("ds"), RooFit::NumEvents(100));
auto cat = dynamic_cast<RooCategory*>(ds->addColumn(catThr));
RooSimultaneous sim("sim", "", *cat);
sim.addPdf(g, "a");
sim.addPdf(g, "b");
sim.addPdf(g, "c");
sim.fitTo(*ds, RooFit::Save(true)
, RooFit::BatchMode(true) // commenting this solves the issue
)->Print("V");
// _a_x parameter in results, cov matrix is "Approximation only"
// works
//g.fitTo(*ds, RooFit::Save(true), RooFit::BatchMode(true))->Print("V");
}
Setup
Centos7 ROOT 6.26.04 from LCG dev4
Additional context
Hi, thanks a lot for opening this issue! I was aware of this, and in the next patch release these spurious free parameters will be gone (6.26.06 is expected in about 2 weeks).
However, it is still good that you opened this issue because now that you used Print("V")
, I see that the spurious parameters are still there in the fit result with ROOT master
, they are just constant!
RooFitResult: minimized FCN value: 46.0037, estimated distance to minimum: 1.51615e-08
covariance matrix quality: Full, accurate covariance matrix
Status : MINIMIZE=0 HESSE=0
Constant Parameter Value
-------------------- ------------
_a_x 5.0000e-01
Floating Parameter InitialValue FinalValue +/- Error GblCorr.
-------------------- ------------ -------------------------- --------
m 5.0000e-01 4.8765e-01 +/- 1.02e-02 <none>
So to close this issue for good, we also have to get rid of them in the "Constant Parameter" list. These spurious parameters should not be seen anywhere, as they are pure implementation details for the BatchMode.
As it is important that the BatchMode has no possible disadvantage compared to the old RooFit, I put this issue into the list of issues that need to be resolved before the next patch release: https://github.com/root-project/root/issues/10758
Hi Jonas thank you for your answer!
I switched to LCG dev3 to test on master and noticed that the fitted parameter value is equal to what it was in 6.26.04+batchmode, so still different from no batchmode or batchmode in 6.24.06 It is also still dependent on label mapping (the likelihood too). Is this expected?
No, this is not expected at all. I will investigate this!
Ah I see now what is going on. In the BatchMode implementation, each RooAbsArg in your model must be unique (this was always preferred in RooFit, but the BatchMode is more strict with that). You are using the same pdf for each component of the RooSimultaneous, and it can't deal with that. This code would work:
void test_batchModeCategory() {
RooRealVar x("x", "", 0, 1);
RooRealVar rnd("rnd", "", 0, 1);
// change this mapping from labels to indices to change the fit result (slightly)
RooThresholdCategory catThr("cat", "", rnd, "a", 2);
catThr.addThreshold(1./3, "b", 0);
catThr.addThreshold(2./3, "c", 1);
RooRealVar m("m", "", 0.5, 0, 1);
RooGaussian g1("g1", "", x, m, RooFit::RooConst(0.1));
RooGaussian g2("g2", "", x, m, RooFit::RooConst(0.1));
RooGaussian g3("g3", "", x, m, RooFit::RooConst(0.1));
RooUniform rndPdf("rndPdf", "", rnd);
RooProdPdf pdf("pdf", "", RooArgSet(g1, rndPdf));
auto ds = pdf.generate(RooArgSet(x, rnd), RooFit::Name("ds"), RooFit::NumEvents(100));
auto cat = dynamic_cast<RooCategory*>(ds->addColumn(catThr));
RooSimultaneous sim("sim", "", *cat);
sim.addPdf(g1, "a");
sim.addPdf(g2, "b");
sim.addPdf(g3, "c");
for(bool batchMode : {false, true}) {
sim.fitTo(*ds
, RooFit::Save(true)
, RooFit::PrintLevel(-1)
, RooFit::BatchMode(batchMode) // commenting this solves the issue
)->Print("V");
m.setVal(0.5);
m.setError(0.0);
}
}
But your example shows that repeated pdfs in different RooSimultaneous components should better be supported to not cause confusion like this. And in any other case, there should be a strong warning or thrown exception if that happens.
Could you expand a bit on what exactly the limitation is? I'd like to understand whether my actual model is affected or not.
The limitation right now is: RooAbsArgs that are in different channels of the RooSimultaneous and depend on data are not allowed to have the same name (by the RooAbsArgs I don't only mean the top-level channel pdf but also anything that is serving that pdf).
In your initial example, there was a Gaussian "g" in each channel. But I hope to eliminate this limitation by the next patch release.
Oh, ok, it's linked to the RooSimultaneous. That makes sense, thank you. If I understood correctly my model is affected. By the next patch release you mean 6.26.06 (meaning in the next 2 weeks)?
Yes exactly!
Hi! This problem is now fixed in master
. There should be no spurious parameters anymore, and naming collisions are avoided by renaming all RooAbsArgs for each component with a prefix.
The fix will make it to the next patch release, as tracked in this issue: https://github.com/root-project/root/issues/11534