hist icon indicating copy to clipboard operation
hist copied to clipboard

External package integration

Open LovelyBuggies opened this issue 4 years ago • 15 comments

@henryiii We are going to add some shortcuts for analysis. Could you please specify which kinds of analysis are needed? And what tools or packages do you think are proper?

  • GooFit provides maximum-likelihood fits for arbitrary functions. It seems good, but it's based on GPU devices, and might no be of common use.
  • Iminuit is most commonly used for likelihood fits of models to data. But it's a Python interface for C++ MINUIT2. We might expect a more Pythonic package.
  • Probfit helps us construct a complex fit. However, it's iminuit-based.
  • Zfit is a TensorFlow based fitting model.

There are problems with the above fitting models: GPU-oriented, C++ based, and externally dependent relied. We expect a less dependent, more pythonic solution for common use. I recommend Scipy. Scipy's optimizer module gives us the flexibility to solve problems related to fitting and other data analysis (though it may not perform as well as the more specialized solutions like maximum-likelihood fits).

In addition to this, it is not clear whether our shortcuts should include classification, regression, clustering, etc. (I did not find any questions on the channel.) If yes, scikit-learn could be a wonderful solution.

LovelyBuggies avatar Mar 15 '20 13:03 LovelyBuggies

Scipy is not dependency-relied and could provide analyzing methods other than fitting, such as integration ... (though I am not sure whether they are of use for HEP). The points are: 1) It might not be specific as GooFit... 2) Using a Scikit-HEP package might be more, umm... HEP-ecosystemic.

LovelyBuggies avatar Mar 15 '20 13:03 LovelyBuggies

hi @LovelyBuggies if you would like to have a histogram-based statistics model, https://github.com/scikit-hep/pyhf might be interesting and only depends on scipy + numpy

lukasheinrich avatar Mar 15 '20 14:03 lukasheinrich

@lukasheinrich Thanks for your suggestions, I will dive into pyhf and see whether it is proper for the functionality in hist.

LovelyBuggies avatar Mar 15 '20 14:03 LovelyBuggies

This is two separate issues: Shortcuts for easy interaction, and adaptors/integration into other packages (which could also be called shortcuts). In general, we should be able to implement some of them / many of them without adding a dependency on the package, though we will have to be careful when we do.

henryiii avatar Mar 17 '20 17:03 henryiii

I think we should focus on how to "feed" our histograms to these other packages. Maybe come up with a standard histogram API? Then boost-histogram (and maybe others, like Physt) could also support it.

henryiii avatar Mar 17 '20 18:03 henryiii

One thing that might be important for all but the most simple clients is feeding a structured set of histograms. I started some work along those lines with @jpivarski with histbook and the idea of a "book" / nest-able structure of histograms would be useful. cc @matthewfeickert @kratsg

lukasheinrich avatar Mar 17 '20 18:03 lukasheinrich

@lukasheinrich An initiative concerning 'nest' was put forward here.

LovelyBuggies avatar Mar 18 '20 04:03 LovelyBuggies

What exactly is the problem with iminuit's interface? What is not pythonic enough about it? iminuit has little in common with the interface of C++ MINUIT, it is pretty pythonic already.

HDembinski avatar Mar 18 '20 17:03 HDembinski

Besides, if you like scipy.optimize.minimize, you may also like https://iminuit.readthedocs.io/en/latest/reference.html#iminuit.minimize

HDembinski avatar Mar 18 '20 17:03 HDembinski

@lukasheinrich boost-histogram supports integer and category axes, which can be used to bundle histograms together. I use these axes to have a common histogram with signal, background, different data subsets, etc. What can histbook do that boost-histogram with these axes cannot do?

HDembinski avatar Mar 19 '20 07:03 HDembinski

@LovelyBuggies I disagree with your initial list of "shortcomings". GPU support is not a problem, it is a feature. Any package that supports the GPU should also fall back to CPU computing when GPUs are not available, of course, like numba and jax.

I hope you got from my previous comment that we cannot replace iminuit with scipy.optimize.

"We expect a less dependent, more pythonic solution for common use." Having well-justified dependencies is ok, if they can be loaded from PyPI and installed automatically. jax and jupyter are high-quality software and they depend on a gazillion of other packages.

HDembinski avatar Mar 19 '20 07:03 HDembinski

@HDembinski Thanks for the correction! Looks like I misunderstand them: integrating iminuit to Hist is feasible and reasonable.

LovelyBuggies avatar Mar 19 '20 08:03 LovelyBuggies

@HDembinski yes some of these axes types are perfectly suitable. Would 'jagged' data work as well? Consider this case: 2 phase phase region (one has data, bkg histoograms with 10 bins), the other has [data, signal, bkg] histograms with 5 bins

2 event categodies    / \
                     /   \ 
   2 samples      / |   / | \    3 samples
                 /  |  /  |  \
     10 bins     |  |  |  |   |    5 bins

lukasheinrich avatar Mar 19 '20 09:03 lukasheinrich

@henryiii I have some tries and make a new demo concerning this topic HERE :)

LovelyBuggies avatar Mar 24 '20 16:03 LovelyBuggies

We can encapsulate the work into funcs like h.to_numpy(), e.g., h.to_aghast(), h.to_mplhep(), h.to_root(), etc.

LovelyBuggies avatar Mar 28 '20 09:03 LovelyBuggies