[MNT]: Clarify whether the values of an AxesImage are known as "data" or as "array"
Summary
The docstring of AxesImage.set_array says "Retained for backwards compatibility - use set_data instead.", but on the getter side, only get_array (inherited from ScalarMappable) exists -- get_data doesn't even exist.
Proposed fix
Be consistent as to which name, "data" or "array", is used. I suspect they could even be made aliases of one another at the ScalarMappable level... (Perhaps also distantly related to the colorizer-related API changes.)
Can this be assigned to me?
@AbhinRustagi you are welcome to submit a pull request for any issue, but we do not assign them. Please see our contributors’ guide: https://matplotlib.org/devdocs/devel/index.html
Cool, thank you. I'm working on this
@rcomer I have added get_data and get_array to AxesImage. Should that be fine or should I add an alias to ScalarMappable?
Also, should I remove one of these terms altogether, to keep both for now?
First time contributor, would appreciate help and feedback!
The difficult part about thieves to decide which API we actually want.
Just going by the context of the class, I think it should remain array
In my mind array is more specific than data: for a scalar mappable, each number in array is mapped to a color using your chosen colormap. The data in an image can be mapped like that, but you can also provide RGB(A) directly and ignore the colormap.
Having said that, QuadMesh just uses array(though that only started accepting RGB(A) relatively recently).
Actually it looks like it became data to support PIL images. In that case, we do not retain the original input so what you get out with get_array will not be the same as what you put in with set_data.
We have two subclasses of ScalarMappable, Collection and images _ImageBase and subclasses.
The array naming comes in at ScalarMappable to describe an abstract quantity that can be colormapped.
Collection
For Collection this also makes sense (doc):
Each Collection can optionally be used as its own ScalarMappable by passing the norm and cmap parameters to its constructor. If the Collection's ScalarMappable matrix _A has been set (via a call to Collection.set_array), then at draw time this internal scalar mappable will be used to set the facecolors and edgecolors, ignoring those that were manually passed in.
The colormapped quantity influences the style, of the Collections, but I would not count it as core data; e.g. for an EllipseCollection, I would regard position, size and angle as data.
Images
The situation is different for images, where the color is the core data information.
How to proceed
IMHO, we don't want set_data on collections and thus not on ScalarMappable. Therefore, keep ScalarMappable as is. Since images derive from ScalarMappable, they technically keep the "array" notion and API. We should additionally have the "data" notion as a semantic high-level interface and to reflect the fact that colors are the core aspect of images.
T.b.d.: Should "data" just be alias of "array", i.e. it exists purely to communicate the "core data" aspect? Or is there a real difference between "data" and "array", e.g. https://github.com/matplotlib/matplotlib/issues/28929#issuecomment-2393850164? Currently we don't make that difference. mapped and RGB(A) data are both stored in the array ScalarMappable._A. This means that already ScalarMappable supports RGBA arrays alongside scalar mapped arrays. So introducing this as a semantic difference for images "data" vs "array" does not make sense without refactoring (which would be a different and larger topic, but I currently don't see the need). The second difference is that *Image.set_data was implemented to accept PIL images as input. .Image.set_array was reimplemented to also support this.
For now, I recommend to regard "data" and "array" as aliases in images.
@timhoffm Since your proposed recommendation is to regard data and array as aliases in images, and not make any changes to ScalarMappable, since separating the contexts does incur a bigger change - I believe the quick fix here should be to keep as is or introduce a get_data in AxesImage (which I've done in the raised PR) or any Image based child classes of ScalarMappable as originally suggested. Does that sound like the right way to go?
@timhoffm just following up
We actually have a class decorator that we use for defining aliases on the artists. For example, here it defines several on Line2D:
https://github.com/matplotlib/matplotlib/blob/55037744f53ed8dbebd778e9afb8244dc847f4b1/lib/matplotlib/lines.py#L217-L228
@AbhinRustagi sorry, I currently don't have time to look at this topic, and therefore cannot comment.
@rcomer Though not explicitly documented, I've always regarded aliases as shortcut names for Artist properties. It may be that it would also work for data, but I'd slightly prefer to write this out explicitly, which also opens up the possibility to add a note on why this alias exists and the coloring as "primary data" vs coloring as style. And which to prefer.