root [RF] RooDataHist from a variable binning TH1D: different shape, different bin content

Check duplicate issues.

[ ] Checked for duplicates

Description

This problem has been already mentioned here [0] but no solution has been proposed. When you compare a RooDataHist obtained from a TH1 with fixed size binning everything looks ok but when you do the same from a variable size binning the comparison fails. See code below to reproduce the comparison.

[0] https://root-forum.cern.ch/t/roodatahist-and-th1d-why-different-shape-bin-content/56971

Reproducer

just run root mytest.cpp

using namespace RooFit;

void mytest() {

  const int nBin=100;
  double xbins[nBin+1];
  const float left_Val = 150;
  const float rightVal = 1150;  
  const float logxmin = TMath::Log10(left_Val);
  const float logxmax = TMath::Log10(rightVal);
  const double dxLog = (logxmax-logxmin)/nBin;
  for (int i=0;i<=nBin;i++) xbins[i] = TMath::Power(10., logxmin + i * dxLog);

  TF1 *f1 = new TF1("f1","expo(0)",left_Val,rightVal);
  f1->SetParameters(1e2,-1e-2);

  TH1D *h1 = new TH1D("h1","",nBin,xbins);
  h1->FillRandom("f1",10000000);
  RooRealVar mInv("mInv","m [GeV]",150,1150);
  RooDataHist hist_test1("hist_test1","hist_test1",mInv,Import(*h1,false));
  TCanvas *cExpo1 = new TCanvas();
  gPad->SetLogx();
  gPad->SetLogy();
  auto plot_test1 = mInv.frame();
  hist_test1.plotOn(plot_test1,DataError(RooAbsData::SumW2));
  plot_test1->Draw();
  h1->Draw("same");

  TH1D *h2 = new TH1D("h2","",nBin,left_Val,rightVal);
  h2->FillRandom("f1",10000000);
  RooDataHist hist_test2("hist_test2","hist_test2",mInv,Import(*h2,false));
  TCanvas *cExpo2 = new TCanvas();
  gPad->SetLogx();
  gPad->SetLogy();
  auto plot_test2 = mInv.frame();
  hist_test2.plotOn(plot_test2,DataError(RooAbsData::SumW2));
  plot_test2->Draw();
  h2->Draw("same");

}

ROOT version

ROOT 6.30/06

Installation method

homebrew

Operating system

OSX Sonoma 14.5

Additional context

No response

Jul 13 '24 10:07 santocch

Hi few comments after digging a bit in the code...

it looks the problem is not in the RooDataHist itself but it's in the plotOn function. If you check the RooDataHist content, bin content is equal to the original histogram. The problem looks coming from this line [0]: if you comment it then the plot looks good.
the plotting problem occurs also when you get a PDF from the RooDataHist using RooHistPdf. If you compare the PDF with the original histogram (scaled to have integral equal to one), the fixed size bin histogram looks good while the variable size bin histogram doesn't. In this case I didn't find how to fix the problem (the RooHistPdf code is a lot more complicated than the RooDataHist routine) but checking the behaviour of the PDF (i.e fitting the PDF to the RooDataHist), everything looks ok both for the fixed and variable size histogram.

See the attached modified version of the previous code to see the PDF begaviour mentioned above.

Best

Attilio

[0] https://github.com/root-project/root/blob/b8b0a8150325be271b45038dd81b751cb8a7a41d/roofit/roofitcore/src/RooHist.cxx#L434


int nEntries = 1000000;

void myTestRooDataHist() {

  const int nBin=100;
  double xbins[nBin+1];
  const float left_Val = 150;
  const float rightVal = 1150;  
  const float logxmin = TMath::Log10(left_Val);
  const float logxmax = TMath::Log10(rightVal);
  const double dxLog = (logxmax-logxmin)/nBin;
  for (int i=0;i<=nBin;i++) xbins[i] = TMath::Power(10., logxmin + i * dxLog);

  TF1 *f1 = new TF1("f1","expo(0)",left_Val,rightVal);
  f1->SetParameters(1e2,-1e-2);

  RooRealVar mInv("mInv","m [GeV]",150,1150);
  RooRealVar expo("expo", "expo", -1.0, -0.00001);
  RooExponential fitExpoFun("background", "background", mInv, expo);

  TH1D *h1 = new TH1D("h1","",nBin,left_Val,rightVal);
  h1->FillRandom("f1",nEntries);
  RooDataHist hist_test1("hist_test1","hist_test1",mInv,h1,1);
  TCanvas *cExpo1 = new TCanvas("cExpo1","",600,600);
  gPad->SetLogx();
  gPad->SetLogy();
  auto plot_test1 = mInv.frame();
  hist_test1.plotOn(plot_test1);
  plot_test1->Draw();
  h1->SetLineColor(2);
  h1->Draw("same");

  TH1D *h2 = new TH1D("h2","",nBin,xbins);
  h2->FillRandom("f1",nEntries);
  RooDataHist hist_test2("hist_test2","hist_test2",mInv,h2,1);
  TCanvas *cExpo2 = new TCanvas("cExpo2","",650,0,600,600);
  gPad->SetLogx();
  gPad->SetLogy();
  auto plot_test2 = mInv.frame();
  hist_test2.plotOn(plot_test2);
  plot_test2->Draw();
  h2->SetLineColor(2);
  h2->Draw("same");

  TCanvas *cExpo4 = new TCanvas("cExpo4","",1300,0,600,600);
  gPad->SetLogx();
  gPad->SetLogy();  
  RooHistPdf h1_PDF("h1_PDF","h1_PDF",mInv,hist_test1,0);
  auto plot_test1_pdf = mInv.frame();
  h1_PDF.plotOn(plot_test1_pdf);
  plot_test1_pdf->Draw();
  TH1D *h1clone = (TH1D*)h1->Clone("h1clone");
  h1clone->Scale(1./h1clone->Integral());
  h1clone->SetMarkerSize(0.5);
  h1clone->SetMarkerStyle(20);
  h1clone->SetMarkerColor(2);
  h1clone->SetLineColor(2);
  h1clone->Draw("same EP");
  auto result1 = fitExpoFun.fitTo(hist_test1,PrintLevel(0),RooFit::Save(true));
  result1->Print();

  TCanvas *cExpo5 = new TCanvas("cExpo5","",0,650,600,600);
  gPad->SetLogx();
  gPad->SetLogy();  
  RooHistPdf h2_PDF("h2_PDF","h2_PDF",mInv,hist_test2,0);
  auto plot_test2_pdf = mInv.frame();
  h2_PDF.plotOn(plot_test2_pdf);
  plot_test2_pdf->Draw();
  TH1D *h2clone = (TH1D*)h2->Clone("h2clone");
  h2clone->Scale(1./h2clone->Integral());
  h2clone->SetMarkerSize(0.5);
  h2clone->SetMarkerStyle(20);
  h2clone->SetMarkerColor(2);
  h2clone->SetLineColor(2);
  h2clone->Draw("same EP");
  auto result2 = fitExpoFun.fitTo(hist_test2,PrintLevel(0),RooFit::Save(true));
  result2->Print();

}

Aug 09 '24 09:08 santocch

Hi, thank you very much for the report. I'll see how to improve the situation, because it's asked about quite a lot.

There is also this JIRA issue, for example: https://its.cern.ch/jira/browse/ROOT-10498

Feb 10 '25 15:02 guitargeek

Hi @santocch! I understand your confusion, but I think RooFit arguably does the right thing here.

The challenge when plotting the RooHist is that one needs to visually compare it to a pdf, which is a probability density. Scaling the pdf with the bin width to give predicted counts for easy comparison is not desirable, because it's not meaningful for analytical shapes (think about having the fitExpoFun from the reproducer in the same plot).

So what RooFit does instead is to scale the histogram counts by dividing with the bin width to get an empirical probability density. And then it multiplies the densities with the average bin width $\bar{w}$, so that you actually have counts again in the uniform binning case, which is easy to interpret. So you have $\bar{w} p(x)$ when plotting pdfs, and $\bar{w} \frac{c_i} {w_i}$ for plotting data, where $c_i$ and $w_i$ are counts and bin widths respectively. That's what the line does that you are commenting out.

In the non-uniform binning case, $\bar{w} \frac{c_i} {w_i}$ doesn't simplify back to the counts $c_i$ anymore, but it's rather some arbitrarily scaled probability density. Maybe a different factor can be found to make it less confusing. Anyway, you can't compare this to the TH1s, which simply visualize the counts $c_i$ when you plot them.

Does it make more sense to you now? You have maybe some suggestions to improve RooFit and its documentation to make it less confusing? I'll remove the bug label though, because as I explained the behavior is intended.

Feb 12 '25 16:02 guitargeek

Closing this bug report, since it is intended that the RooDataHist behaves different from the TH1D when plotting. Feel free to re-open or create a new issue if you have improvement requests related to this, @santocch!

Feb 15 '25 10:02 guitargeek