cabinetry Visualization API changes

This collects information regarding changes in the cabinetry visualization API, and is a follow-up to #251.

#250 made matplotlib a core dependency and refactored the plotting code.
#264 made functions in the visualize module return figures (or a list of dictionaries with figures)
#267 changed visualize.data_mc to take a model prediction object instead of a model, and added a new channels keyword argument
#271 fixed duplicate display of figures from functions returning figures (instead of lists of dicts) in notebooks (see also https://github.com/alexander-held/cabinetry/issues/265#issuecomment-907621136 below)
#399 added support for custom histogram colors for data/MC plots

Outstanding items and open questions (including pieces from #381):

[ ] #142 (aiming at v0.4)
- The natural target for this seems to be visualize.plot_model and visualize.plot_result, and those functions should then likely return artists. Calling these functions directly comes with a loss in convenience, e.g. the correlation matrix pruning threshold. Could consider factoring out the convenience functions? Handing axes to the visualize-level functions is more challenging, since several of these can return multiple figures (and the exact number is not easily known for visualize.templates).
[ ] Consider supporting callbacks as suggested in https://github.com/alexander-held/cabinetry/issues/113#issuecomment-697846016.
[ ] Consider making return of figures optional to avoid keeping too many figures in memory for visualize.templates (figures still kept around even with close_figure=True as long as reference to them exists).
[ ] In addition to this it seems useful to not override custom rcParams set by users via the mpl.style.use calls in cabinetry but instead only update values that correspond to the matplotlib default. Then users could do something like
```
import matplotlib as mpl
mpl.rcParams['axes.prop_cycle'] = mpl.cycler(color=["salmon", "tan", "mediumseagreen"])
```
to get a custom color scheme. (from #381)
- this may actually not override everything as initially expected, see https://github.com/iris-hep/analysis-grand-challenge/pull/117#discussion_r1165441247
[ ] Another idea: new setting style with default style="cabinetry" that will apply the mpl.style.use call, and the option style=None that will skip it. That allows users to set rcParams in any way they want. Some other styling operations like tick label design and such can probably also be factored from the code and put into a style sheet gathering everything. (from #381)

Aug 20 '21 16:08 alexander-held

As of 5ed199a, functions returning a single figure cause them to be rendered twice when called as the last line in a notebook cell. The reason is the following I believe:

The matplotlib inline backend looks for any figures that pyplot knows about (plt.get_fignums()), renders them to png, Base64 encodes them, puts that into the notebook (which makes the figure show up) and then closes all figures (presumably plt.close("all")). This rendering will always happen for any figures that are still open, which is the default behavior in cabinetry as of this commit.
The return value of the last line in a cell will also be shown as the result of the cell, and that happens to be a figure in the case of functions like visualize.pulls which produce a single figure.

Functions producing multiple figures are not affected by this duplication, since the return value is a dict and that will not render the functions it contains.

To solve duplication, there are the following options:

Make figure closing default for all functions returning a single figure. A figure can then be shown in the following ways:
- Figure shows up because it is the return value of the last line in the cell:
```
...
visualize.pulls(...)
```
- Figure is produced earlier and assigned to object, and that is referenced again:
```
fig = visualize.pulls(...)
...
fig
```
- Default closing is disabled, but rendering of return value is avoided. The advantage of this is that multiple figures can be rendered and this can happen anywhere in the cell.
```
visualize.pulls(..., close_figure=True);
```
  or
```
_ = visualize.pulls(..., close_figure=True)
```
The downside of this approach is that visualize.data_mc and visualize.templates would still not have figure closing enabled by default and thereby behave differently. The advantage is that the easiest use case of just calling the single-figure-producing functions without thinking about return values or optional arguments "just works" correctly, and the multi-figure functions also work (via a different method).
Make figure closing default for all functions, including multi-figure functions. Rendering of multi-figure functions could then be achieved via a small helper function:
```
from IPython.display import display

def display_helper(fig_list_dict):
    for fig_dict in fig_list_dict:
        display(fig_dict["figure"])
```
This could be called on return values of visualize.data_mc and visualize.templates to show all figures at once, even if they already have been closed.
Return a class with _repr_html_ defined to manually handle things (see hist example). This is similar to the suggestion from #163.

A reasonable short term solution seems to be to close figures from single-figure functions by default. There are multiple ways for them to still be rendered anyway. Multi-figure functions can stay open by default, so all figures are also rendered there. In the longer term a more unified solution could be useful. Feedback from analyzers using cabinetry in notebooks is very welcome!

Aug 28 '21 12:08 alexander-held

Examples of editing a data/MC figure (experiment labels, axis labels, removing existing text on the figure and replacing it): https://gist.github.com/alexander-held/2ca63e4c4c3de2114bf8d903bf28bb4a

edit: now also includes an example for how to add a normalized signal (and re-do the legend)

Jul 17 '23 20:07 alexander-held

cabinetry cabinetry copied to clipboard

Visualization API changes

cabinetry
cabinetry copied to clipboard