matplotlib icon indicating copy to clipboard operation
matplotlib copied to clipboard

subplots() returns AxesArray

Open timhoffm opened this issue 2 years ago • 6 comments

subplots() used to return a numpy array of Axes, which has some drawbacks. The numpy array is mainly used as a 2D container structure that allows 2D indexing. Apart from that, it's not particularly well suited:

  • Many of the numpy functions do not work on Axes.
  • Some functions work, but have awkward semantics; e.g. len() gives the number of rows.
  • We can't add our own functionality.

AxesArray introduces a facade to the underlying array to allow us to customize the API. For the beginning, the API is 100% compatible with the previous numpy array behavior, but we deprecate everything except for a few reasonable methods.

timhoffm avatar May 21 '23 00:05 timhoffm

I think we need to decide whether we want to be normative ("there is one way to do it") or lax ("we support all ndarray methods that could make sense"). I feel like you prefer being normative here, but I'd rather be lax because AxesArray is a somewhat ad-hoc API, and I don't think we should ask anyone to actually learn it (e.g. learn that the only way to unravel it is .flat and not .ravel() or .flatten()); users should just be able treat it as a numpy array and use whatever numpy array methods they can think of as long as it makes sense.

As a more specific point, I also don't think that len() returning the number of rows is particularly bad; I've probably(?) relied on that in the past and again that's just normal numpy semantics (use .size for the total number of elements), which I'd rather not break.

anntzer avatar May 21 '23 16:05 anntzer

I‘m in between lax and normative. We should have clear recommendations (flat is preferred over flatten or ravel). We should continue to support „common“ current operations (e.g. flatten and ravel) because we don’t want to annoy users unnecessarily changing their working code. OTOH we should not feel obliged to log-term maintain every possible array operation (technically axis.roll() should work but it is somewhat arcane and IMHO not really useful). But since I can‘t oversee what is needed and reasonable for the whole array API, I plan to by default only keep API I and others discussing here find useful. Everything else gets a pending deprecation. This gives users time to object to the deprecation (and we should make this explicit in the message: „If you think this API should continue to be supported, please open an issue about it in the matplotlib bug tracker.“)

Overall the new API will be a relatively permissive opt-in: We more or less take the existing stuff that people (we and users) still want.

timhoffm avatar May 21 '23 17:05 timhoffm

Is the idea to support "broadcasting" of the Axes method? Like AxisArray().set_xlim etc?

oscargus avatar May 21 '23 18:05 oscargus

I'm not 100% clear on what the planned advantages of adding a new class are, so my vote is heavily on the lax side of things. If this has a proposed use, can that be devised at the same time?

jklymak avatar May 21 '23 20:05 jklymak

Plans / possibilities:

  • broadcasting methods (e.g. axs[:,0].set_ylabel(text))
  • remove the need for squeeze=False for Nx1 or 1xM subplots: We can support 1D and 2D indexing on them simultaneously.
  • Narrow the API to useful methods. Tab-completion currently gives a lot of non-applicable methods.
  • T.b.d.: long-term behavior changes. Iteration yields rows is a one-sided preference. We could have iterrows() / itercolumns(). And maybe make iteration go over all elements. I'm aware that'd be quite a migration, but worth discussing as a long-term strategy.

timhoffm avatar May 22 '23 02:05 timhoffm

Personally I'm +20 for some way of removing the need for squeeze, but see #19463 for a concrete alternative proposal that achieves the same thing with much less machinery needed (TLDR: I proposed to support subplots((m, n)) where passing a tuple as a single argument explicitly means "I absolutely want a 2D array back -- I know you already commented there).

broadcasting methods

I guess I'm +0 there; the loop is not too bad. Would be worth seeing some concrete examples where this helps. Note that usually you will have a loop somewhere over the axes anyways to do the plotting over each of them, so you can just put everything in the loop; for the axs[:, 0].set_ylabel(...) case there's the label_outer() helper (which is admittedly a bit of a kludge).

Iteration yields rows is a one-sided preference. We could have iterrows() / itercolumns(). And maybe make iteration go over all elements. I'm aware that'd be quite a migration, but worth discussing as a long-term strategy.

Let's not do that; ndarray semantics seem just fine here. (Numpy has been fine for years with for x in t.ravel(): ... / for x in t.ravel("F"): ...)

anntzer avatar May 22 '23 07:05 anntzer