cabinetry
cabinetry copied to clipboard
Visualization API changes
This collects information regarding changes in the cabinetry
visualization API, and is a follow-up to #251.
- #250 made
matplotlib
a core dependency and refactored the plotting code. - #264 made functions in the
visualize
module return figures (or a list of dictionaries with figures) - #267 changed
visualize.data_mc
to take a model prediction object instead of a model, and added a newchannels
keyword argument - #271 fixed duplicate display of figures from functions returning figures (instead of lists of dicts) in notebooks (see also https://github.com/alexander-held/cabinetry/issues/265#issuecomment-907621136 below)
- #399 added support for custom histogram colors for data/MC plots
Outstanding items and open questions (including pieces from #381):
- [ ] #142 (aiming at v0.4)
- The natural target for this seems to be
visualize.plot_model
andvisualize.plot_result
, and those functions should then likely return artists. Calling these functions directly comes with a loss in convenience, e.g. the correlation matrix pruning threshold. Could consider factoring out the convenience functions? Handing axes to thevisualize
-level functions is more challenging, since several of these can return multiple figures (and the exact number is not easily known forvisualize.templates
).
- The natural target for this seems to be
- [ ] Consider supporting callbacks as suggested in https://github.com/alexander-held/cabinetry/issues/113#issuecomment-697846016.
- [ ] Consider making return of figures optional to avoid keeping too many figures in memory for
visualize.templates
(figures still kept around even withclose_figure=True
as long as reference to them exists). - [ ] In addition to this it seems useful to not override custom rcParams set by users via the mpl.style.use calls in cabinetry but instead only update values that correspond to the matplotlib default. Then users could do something like
to get a custom color scheme. (from #381)import matplotlib as mpl mpl.rcParams['axes.prop_cycle'] = mpl.cycler(color=["salmon", "tan", "mediumseagreen"])
- this may actually not override everything as initially expected, see https://github.com/iris-hep/analysis-grand-challenge/pull/117#discussion_r1165441247
- [ ] Another idea: new setting style with default style="cabinetry" that will apply the mpl.style.use call, and the option style=None that will skip it. That allows users to set rcParams in any way they want. Some other styling operations like tick label design and such can probably also be factored from the code and put into a style sheet gathering everything. (from #381)
As of 5ed199a, functions returning a single figure cause them to be rendered twice when called as the last line in a notebook cell. The reason is the following I believe:
- The
matplotlib
inline backend looks for any figures thatpyplot
knows about (plt.get_fignums()
), renders them to png, Base64 encodes them, puts that into the notebook (which makes the figure show up) and then closes all figures (presumablyplt.close("all")
). This rendering will always happen for any figures that are still open, which is the default behavior incabinetry
as of this commit. - The return value of the last line in a cell will also be shown as the result of the cell, and that happens to be a figure in the case of functions like
visualize.pulls
which produce a single figure.
Functions producing multiple figures are not affected by this duplication, since the return value is a dict
and that will not render the functions it contains.
To solve duplication, there are the following options:
- Make figure closing default for all functions returning a single figure. A figure can then be shown in the following ways:
- Figure shows up because it is the return value of the last line in the cell:
... visualize.pulls(...)
- Figure is produced earlier and assigned to object, and that is referenced again:
fig = visualize.pulls(...) ... fig
- Default closing is disabled, but rendering of return value is avoided. The advantage of this is that multiple figures can be rendered and this can happen anywhere in the cell.
orvisualize.pulls(..., close_figure=True);
_ = visualize.pulls(..., close_figure=True)
visualize.data_mc
andvisualize.templates
would still not have figure closing enabled by default and thereby behave differently. The advantage is that the easiest use case of just calling the single-figure-producing functions without thinking about return values or optional arguments "just works" correctly, and the multi-figure functions also work (via a different method). - Figure shows up because it is the return value of the last line in the cell:
- Make figure closing default for all functions, including multi-figure functions. Rendering of multi-figure functions could then be achieved via a small helper function:
This could be called on return values offrom IPython.display import display def display_helper(fig_list_dict): for fig_dict in fig_list_dict: display(fig_dict["figure"])
visualize.data_mc
andvisualize.templates
to show all figures at once, even if they already have been closed. - Return a class with
_repr_html_
defined to manually handle things (seehist
example). This is similar to the suggestion from #163.
A reasonable short term solution seems to be to close figures from single-figure functions by default. There are multiple ways for them to still be rendered anyway. Multi-figure functions can stay open by default, so all figures are also rendered there. In the longer term a more unified solution could be useful. Feedback from analyzers using cabinetry
in notebooks is very welcome!
Examples of editing a data/MC figure (experiment labels, axis labels, removing existing text on the figure and replacing it): https://gist.github.com/alexander-held/2ca63e4c4c3de2114bf8d903bf28bb4a
edit: now also includes an example for how to add a normalized signal (and re-do the legend)