ndcube icon indicating copy to clipboard operation
ndcube copied to clipboard

Add option to instantiate an NDCube with array indices in pixel order

Open ayshih opened this issue 6 months ago • 13 comments

Describe the feature

I find it maddening that NDCube bakes in that the array indices are in the reverse order of the WCS pixel axes. This is a natural consequence of the natural (row-major) ordering of NumPy arrays, where the array indices start with the largest strides. However, that means a user will always have to think about whether the API is asking for input in array-index ordering or in WCS-pixel ordering.

Take this cube from the documentation:

import astropy.wcs
import matplotlib.pyplot as plt
import numpy as np

from ndcube import NDCube

# Define data array.
data = np.random.rand(3, 4, 5)
# Define WCS transformations in an astropy WCS object.
wcs = astropy.wcs.WCS(naxis=3)
wcs.wcs.ctype = 'WAVE', 'HPLT-TAN', 'HPLN-TAN'
wcs.wcs.cunit = 'Angstrom', 'deg', 'deg'
wcs.wcs.cdelt = 0.2, 0.5, 0.4
wcs.wcs.crpix = 0, 2, 2
wcs.wcs.crval = 10, 0.5, 1
wcs.wcs.cname = 'wavelength', 'HPC lat', 'HPC lon'

# Now instantiate the NDCube
my_cube = NDCube(data, wcs=wcs)

Wavelength is the first WCS pixel axis, and is indexed in the first position in the attached WCS:

>>> my_cube.wcs.axis_type_names[0]
'wavelength'

but for natural NumPy ordering, wavelength comes last, so it is indexed in the third position in NDCube's own API:

>>> my_cube.array_axis_physical_types[2]
('em.wl',)
>>> my_cube[:, :, 0]  # extract the 2D HPC map for the first wavelength
<ndcube.ndcube.NDCube object at 0x00000191F8D6D150>
NDCube
------
Shape: (3, 4)
Physical Types of Axes: [('custom:pos.helioprojective.lat', 'custom:pos.helioprojective.lon'), ('custom:pos.helioprojective.lat', 'custom:pos.helioprojective.lon')]
Unit: None
Data Type: float64
>>> my_cube[:, :, 0].plot()
>>> plt.show()

Image

Proposed solution

I propose that a keyword argument be added to NDCube instantiation that enables having the array indices be in the same order as the WCS pixel axes. For such a NDCube, the user doesn't have to think about which part of the API uses which ordering.

Here is an example using this proof-of-concept branch to create a new version of the above cube:

>>> new_cube = NDCube(data.T, wcs=wcs, index_order="pixel")  # new keyword argument
>>> new_cube.shape
(5, 4, 3)
>>> new_cube.array_axis_physical_types[0]  # wavelength is now indexed in the first position
('em.wl',)
>>> new_cube[0, :, :]
<ndcube.ndcube.NDCube object at 0x0000021C2FFECE90>
NDCube
------
Shape: (4, 3)
Physical Types of Axes: [('custom:pos.helioprojective.lat', 'custom:pos.helioprojective.lon'), ('custom:pos.helioprojective.lat', 'custom:pos.helioprojective.lon')]
Unit: None
Data Type: float64
>>> new_cube[0, :, :].plot()
>>> plt.show()

Image (This proof of concept is far from comprehensive, but it is able to do the above.)

ayshih avatar Jun 28 '25 05:06 ayshih

Copying the discussion from element that triggered this issue here for the record. The conversation started in the context of the MapCube refactor, specifically about replacing Map.superpixel with NDCube.rebin, and the change in axis ordering that entails:

@ayshih: It doesn't make sense for GenericMap.superpixel() to switch to (y, x) ordering. If it's not fully intentional for NDCube.rebin() to be use NumPy ordering, let's change the API there.

@DanRyanIrish: I think this will have to be a no on the NDCube side. Everything on NDCube is array order, unless it must be in world order because it's a direct input to WCS, e.g. the ordering of high-level coordinate objects to NDCube.crop. NDCube never takes pixel order. This is by design for several reasons, one being that an NDCube with array dimensions of (time, lat/lon, wavelength) is perfectly valid. Reversing the order in some case and not other would be a nightmare, and makes no sense unless you're only targeting a user base that only deals with 2-D images and thinks in terms of x-y. NDCube has a potentially much broader target market than that. (Not sure if that suggestion was serious or just trolling me :P I add the above in case the former.) If it's critical for Map to have an (x, y)-ordering API, then keep superpixel. But I think in the medium term it would be less confusing to all to just have rebin and help users adjust to the (y, x) ordering. They have to get used to that anyway if they want to use the slicing API: MapCube[y1:y2, x1:x2].

@Cadair: Completely agreed on the existing method not changing. Perhaps we just put superpixel on the extended deprecation list like everything else? After all .rebin will be a method now.

@DanRyanIrish: I think this is the best way forward.

@nabobalis: Ok, then I can update the tests and move superpixel to the deprecation list instead of deleting it.

@ayshih: Not trolling. I feel that it is an unfortunate design choice to have NDCube lean into NumPy ordering instead of abstracting it away as much as possible. It feels like we tried so hard to have GenericMap be consistent with (x, y) ordering. Some of this is out of our hands (WCS and WCSAxes). I am not looking forward to a future where users will have to think more about whether they need to use (x, y) or (y, x) ordering for a given action.

@nabobalis: thats a fair point, are there going to be other API places where this will just switch and do we know what ones?

@Cadair: The whole point is to standardise the API. I disagree [about abstracting numpy ordering away]. You can't abstract it away without making the inevitable breakdown of that abstraction even more confusing. I'd have to sit down and have a proper check of this, but almost all the ndcube api is array ordered. I'm struggling to think of too many places where we even use world order (particularly when using high level coordinate objects). Having rebin etc in pixel order would be very confusing to me. Everything should be (y,x) 😜 More seriously yes [the difference between wcs and numpy ordering] sucks. But we came to the conclusion a long time ago that trying to hide its suckiness was actually only going to make things worse. Because eventually you end up interacting with the array.

@ayshih: I don't understand why it would have necessarily made things worse. My immediate notion would be that the array itself would be transposed to how it currently is, so pixel ordering and array ordering are then the same. If someone needs to directly access the data with the classic ordering, they just need to make a view that is transposed back. From my perspective, the reason that there is lingering friction in GenericMap of (x, y) vs (y, x) is because we have kept .data as a ndarray that can be directly fed to imshow(). (And, this causes the occasional bug when .data.shape is used incorrectly.) As a data container, .data could have been stored as a transposed view, and left it to plot() to transpose the array back before feeding it to imshow(). Then it would have (x, y) all the way down. I clearly don't use NDCube in any significant way, but this strikes me as a clear negative for refactoring Map to be based on NDCube.

@AlisdairDavey: I think I agree but I'm not sure I understand completely. Can someone clarify. If I have a wcs that is defined y,x what order of axes would I put in to rebin Args. Likewise t,y,x.

@ayshih: To clear, this is pervasive throughout NDCube, not just .rebin(). That is, rebin() is consistent with everything else NDCube. For example, if the WCS pixel axes are (x, y, t), I believe slicing to the first time step would be my_cube[0, :, :].

@AlisdairDavey: How does it decide the order?

@ayshih: (t, y, x). reverse direction, per the normal consequence of row-major ordering. (essentially another battlefield of little-endian versus big-endian ordering)

@Cadair: I just don't see how hiding numpy ordering of the underlying array isn't going to just catch someone out massively. Yes you have to learn that in python array and wcs order are different and it's weird but once you have learnt it then it's not a surprise and it makes it easier to work with e.g. to use data.shape as a guide for inputs to rebin etc. We should probably take this discussion to an issue or something so it's archived for the future.

DanRyanIrish avatar Jun 29 '25 01:06 DanRyanIrish

Thanks @ayshih for raising this issue. It was addressed earlier in ndcube's development, but it's worth revisiting now.

In general, I agree with @Cadair's view. I can't see how we can avoid the fundamental contradiction in the ordering of numpy arrays and WCS seeping out into the user experience. Morevoer, I think attempts to abstract this a way will only make it more confusing to the user when it does emerge, and ultimately not fully protect the user from having to be aware of numpy vs wcs ordering.

There are several reasons why I think requiring the user to be aware of the different orderings is a non-ideal, but the least bad solution. Here are a couple:

  1. This standard, where arrays and WCS are reversed in order is also baked into parts of astropy and solar instrument-specific packages. For example:
  • As you already mentioned, WCSAxes. Users will often want to make custom plots without NDCube.plot for papers or talks. Once they extract the data array and WCS for this purpose, they will have to learn that the array needs to be transposed. So we are simply moving this confusion to less obvious parts of their workflow.
  • Other WCS-based data containers in the astropy/sunpy ecosystem are already (y, x)-ordered. So solar and astro Python users already have to be aware of the array vs wcs ordering, and workflows using Map and these other classes will not be compatible.
    • NDData and its mixins
    • Spectrum1D from specutils
    • SpectrogramCube from sunraster used in irispy-lmasl, sospice, and eispac.
    • dkist user tools.
  1. NDCube supports arithmetic operations with arrays, array-like Quantities, and very recently, wcs-less NDData objects. If NDCube ordering was changed to (x, y) this would mean that NDCubes and arrays with ostensibly broadcastable shapes would not be compatible. This in effect starts a slippery slope where we effectively take the responsibility to reverse engineer all Python array operations relating to NDCube into IDL/FORTRAN ordering.
  2. It will probably substantially complicate a number of places in the codebase, increasing maintenance and making enticing new maintainers more difficult.
  3. ...

I think users will find these and more consequences frustrating and confusing, and ultimately doesn't prevent users having to understand the difference between array and wcs ordering. Worse than this, the scenarios for when they need to think about those will seem arbitrary. As it stands, there is a simple, if unfortunate, rule to learn: NDCube is effectively a coordinate-aware array (with some extras) and so is always array order, while if you access the WCS directly, it's in WCS order. Moreover, this same rule applies throughout the astropy ecosystem, thereby harmonising sunpy and astropy.

DanRyanIrish avatar Jun 29 '25 01:06 DanRyanIrish

Off the top of my head, the only place where the NDCube API is not array order is in the ordering of the inputs to NDCube.crop (and crop_by_values). These are given in world order, since they are world coordinate inputs. However, I would be open to reversing this.

Pros

  • In most, but not all cases, reversing the order will make in the crop input align with the output of NDCube.array_axes_physical_types, which could be more intuitive.

Cons

  • Consider the case where an NDCube has axes of (y, t, x, lambda). In this case the input types to crop would be (SkyCoord, Time, SpectralCoord). Technically, this is neither array, pixel, nor world order, but a 4th type of ordering not supported anywhere else.

Nonetheless, reversing the world order may still be more intuitive, and reinforce the rule that the whole NDCube API is array order. I would guess that beginner users that will not want to know about the difference between array and WCS order will only want to use the NDCube API, without having to touch the WCS directly. In this case, NDCube will be more intuitive as is looks more like a NumPy array.

The above con could be mitigated if we added a property to tell users what high level coordinate object types are required by crop, and the order they must be given in, e.g. NDCube.crop_input_types_order (and NDCube.crop_by_values_input_units_order). While this information can be accessed through the WCS, it is not commonly known or easily discoverable for the typical user.

Irrespective of whether we change the crop inputs ordering, properties like the above could be very helpful. crop is not the most intuitive API, and occasionally I even get confused. Such a property would be it much easier to remember what you need to input.

DanRyanIrish avatar Jun 29 '25 01:06 DanRyanIrish

I would be interested to hear @hayesla's view on this issue as it often challenges my own :)

DanRyanIrish avatar Jun 30 '25 10:06 DanRyanIrish

the ordering of the inputs to NDCube.crop

The ordering of crop is weirder than just straight world order though, because of multiple axes possibly being in one (SkyCoord) object so it's not even straight world order.

I'm probably 👎 on making crop array order, but I haven't thought about it much.

Cadair avatar Jun 30 '25 10:06 Cadair

The ordering of crop is weirder than just straight world order though, because of multiple axes possibly being in one (SkyCoord) object so it's not even straight world order.

Agreed. I think this further justifies a NDCube.crop_input_types_order method/property.

I'm probably 👎 on making crop array order, but I haven't thought about it much.

I'm not sold on it, but it might be more intuitive for users in most cases. I think it's a valid discussion point, at any rate.

DanRyanIrish avatar Jun 30 '25 11:06 DanRyanIrish

I think my overall feeling is covered by the chat log and also @DanRyanIrish's comment above.

This is something that took me a long time to make my peace with. In the early days of my dkist work I made gWCS objects in array order (you can do it, but you probably shouldn't). In the end, I just don't feel like this is something that can be fixed in the API it's too baked in to both Python (C-major) and WCS (Cartesian order) to ever be able to change it, and hiding it from the user is a short-term win for long term pain, when it inevitably rears it's head and confuses you like all good abstractions.

[fwiw, I argued strongly around the APE-14 discussion and at other times that the best thing would be for WCS to support being the opposite order but was resoundingly defeated.]

whether the API is asking for input in array-index ordering or in WCS-pixel ordering.

I do just want to be a pedant here. The NDCube API never uses pixel ordering (it does occasionally use world ordering), the whole purpose of the NDCube API is to unify the array with the world axes provided by the WCS, there should be no need for pixel ordering ever. (Unless the user wants to get fractional coordinates in pixel space, when they will have use the WCS directly).

Cadair avatar Jun 30 '25 11:06 Cadair

@DanRyanIrish Thanks for copying over all of the chat discussion!

I'd like to point out that this feature request is not about changing the default indexing behavior of NDCube, but rather to allow a user the option to interact with an NDCube differently. I was partially inspired by the indexing keyword for numpy.meshgrid(). Whether such an option, even if available, is utilized for the re-implementation of sunpy.map.GenericMap is a highly related consideration, but will be a future battle.

What I hear in the reactions so far is many fears about users getting potentially tripped up, but I think those fears are largely overblown.

Other WCS-based data containers in the astropy/sunpy ecosystem are already (y, x)-ordered.

Not GenericMap! The ndarray it carries is of course (y, x) ordered, but everything in the outer interface is (x, y) ordered. Has this been a massive issue for people moving between GenericMap and other data containers?

Users will often want to make custom plots without NDCube.plot for papers or talks.

Weird, is this how users typically interact with NDCube? For the Map side, I would discourage users from ever making plots without calling GenericMap.plot(), especially now that autoalignment is the default behavior.

NDCube supports arithmetic operations with arrays, array-like Quantities, and very recently, wcs-less NDData objects. If NDCube ordering was changed to (x, y) this would mean that NDCubes and arrays with ostensibly broadcastable shapes would not be compatible.

I think the vast majority of use cases would have arrays that are already compatible, because those arrays would have been generated under the same indexing order. For example, you can subtract adjacent images for a running difference, or you can median across many images to produce a background array.

There will be use cases where you may have an externally generated array, but then you always have to be mindful of what the indices mean and the order they come in. For example, in the example NDCube that I pulled from the docs, the latitude comes before the longitude for some reason, so someone trying to blindly mathematically apply a longitude/latitude array would get tripped up regardless.

It will probably substantially complicate a number of places in the codebase

I suspect it would be largely straightforward, at least within ndcube itself, because we're just talking about indexing order. It's more of an open question of whether downstream packages may need to support NDCube instances that they didn't create themselves, and so the alternative indexing order might trip them up.

ayshih avatar Jun 30 '25 13:06 ayshih

The NDCube API never uses pixel ordering (it does occasionally use world ordering), the whole purpose of the NDCube API is to unify the array with the world axes provided by the WCS, there should be no need for pixel ordering ever. (Unless the user wants to get fractional coordinates in pixel space, when they will have use the WCS directly).

Yes, you're right. I have been implicitly including the WCS API as part of the NDCube API, so as you say, a point of confusion would be users calling my_cube.wcs.pixel_to_world().

ayshih avatar Jun 30 '25 13:06 ayshih

a point of confusion would be users calling my_cube.wcs.pixel_to_world().

This is why the NDCube API and also the documentation makes such a point of strictly using the terms pixel and array so it's clear there's a difference. I don't think we are ever going to be able to get rid of that distinction without major changes to WCS / APE 14 which aren't happening.

I know you want to keep the Map discussion to later, but I actually think it's the only thing left which uses pixel indices / Cartesian order anywhere. I remember back in the last map refactor that it was a sticking point then, with the use of namedtuples to try and bridge the gap and reduce the confusion (which I'm sure nobody uses). I feel like we took the decision in the Map design in a very different era with a lot less understanding of how the ecosystem as a whole worked.

If ndcube provides an API which lets you slice an array in Cartesian order, it's probably going to be the only thing in the whole scientific Python ecosystem which does.

Cadair avatar Jun 30 '25 14:06 Cadair

I'd like to point out that this feature request is not about changing the default indexing behavior of NDCube, but rather to allow a user the option to interact with an NDCube differently.

Yes, this is a good thing to keep in mind. I do sympathise with the frustration of array and WCS being in reverse order. And I also appreciate that those of our community coming from IDL are used to thinking in terms of (x, y). (I can hear @hayesla's voice in my head on this point saying users just want what they're used to ;) ) So I do sympathise where you're coming from on this.

Other WCS-based data containers in the astropy/sunpy ecosystem are already (y, x)-ordered.

Not GenericMap! The ndarray it carries is of course (y, x) ordered, but everything in the outer interface is (x, y) ordered. Has this been a massive issue for people moving between GenericMap and other data containers?

Yes indeed! Not GenericMap. But this is an opportunity to fix that and bring Map into a harmonised trading zone. The EU of Python data classes, if you will, where standardising regulations actually streamlines business! ;)

Joking aside though, if it hasn't been a massive issue, my reading of this would be that it's because GenericMap restricts users from interacting with Map in an array-like way. And this has led to more cumbersome, less intuitive APIs, at least for those whose first Python experience wasn't through Map (which is increasingly the case). For example, having to use submap instead of simply slicing the Map.

But even this does not fully solve the problem. For example, consider the following:

>>> from sunpy.map import Map
>>> import sunpy.data.sample
>>> smap_full = Map(sunpy.data.sample.AIA_171_IMAGE)
>>> smap_full.dimensions
PixelPair(x=<Quantity 1024. pix>, y=<Quantity 1024. pix>)
>>> smap = smap_full.submap([200, 700]*u.pix, width=500*u.pix, height=300*u.pix)  # Make a non-square submap.
>>> smap.dimensions
PixelPair(x=<Quantity 501. pix>, y=<Quantity 301. pix>)

Now, let's say sunpy does want users to think of Map as truly (x, y), and the user wants to perform a pixel-wise subtraction. In that case, this should work.

>>> arr = np.ones((501, 301)) * smap.unit
>>> smap - arr
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[64], line 1
----> 1 smap - arr * smap.unit

File ~/miniconda3/envs/default/lib/python3.11/site-packages/sunpy/map/mapbase.py:586, in GenericMap.__sub__(self, value)
    585 def __sub__(self, value):
--> 586     return self.__add__(-value)

File ~/miniconda3/envs/default/lib/python3.11/site-packages/sunpy/util/decorators.py:309, in check_arithmetic_compatibility.<locals>.inner(instance, value)
    307 except TypeError:
    308     return NotImplemented
--> 309 return func(instance, value)

File ~/miniconda3/envs/default/lib/python3.11/site-packages/sunpy/map/mapbase.py:579, in GenericMap.__add__(self, value)
    577 @check_arithmetic_compatibility
    578 def __add__(self, value):
--> 579     new_data = self.quantity + value
    580     return self._new_instance_from_op(new_data)

File ~/miniconda3/envs/default/lib/python3.11/site-packages/astropy/units/quantity.py:683, in Quantity.__array_ufunc__(self, function, method, *inputs, **kwargs)
    681     return NotImplemented
    682 else:
--> 683     raise e

File ~/miniconda3/envs/default/lib/python3.11/site-packages/astropy/units/quantity.py:658, in Quantity.__array_ufunc__(self, function, method, *inputs, **kwargs)
    655     arrays.append(converter(input_) if converter else input_)
    657 # Call our superclass's __array_ufunc__
--> 658 result = super().__array_ufunc__(function, method, *arrays, **kwargs)
    659 # If unit is None, a plain array is expected (e.g., comparisons), which
    660 # means we're done.
    661 # We're also done if the result was None (for method 'at') or
    662 # NotImplemented, which can happen if other inputs/outputs override
    663 # __array_ufunc__; hopefully, they can then deal with us.
    664 if unit is None or result is None or result is NotImplemented:

ValueError: operands could not be broadcast together with shapes (301,501) (501,301)

However, it clearly doesn't. Instead, users have to know that the Map is actually (y, x), and construct an array with this axis order in order to perform arithmetic operations.

smap - arr.T  # no error this time.

Moreover, if we then inspect the data and WCS attributes:

>>> smap.data.shape  # (y, x)
(301, 501)
>>> smap.wcs
WCS Keywords

Number of WCS axes: 2
CTYPE : 'HPLN-TAN' 'HPLT-TAN' 
CRVAL : 0.00089530541880571 0.00038493926472939 
CRPIX : 312.5 -187.5 
PC1_1 PC1_2  : 0.99999706448085 0.0024230207763071 
PC2_1 PC2_2  : -0.0024230207763071 0.99999706448085 
CDELT : 0.00066744222222222 0.00066744222222222 
NAXIS : 501  301

we see that the Map is actually in reverse axis order to the WCS after all. So Map's API does not spare users from having to know this. It simply delays the lesson, allowing users to get comfortable with a conflicting mental map of how the data is stored, which they then have to unlearn and relearn for more advanced operations. Therefore, introducing an indexing="xy" kwarg for use by Map will still induce a change in Map's behaviour. I don't think it's actually a solution to the problem you're highlighting.

Given this, we might as well force NDCube users to learn the (y, x) mental map upfront, especially since this will be the same for DKIST dataset, SpectrogramCube, EISCube, Spectrum1D, NDData, and in the case of non-coordinate aware objects, numpy and dask arrays, etc. I fear that giving this indexing option to NDCube, where no other array-like/array-ish objects in Python do the same, will make NDCube feel more complicated as users will have to mentally track the axis ordering of each individual cube. Otherwise, users will get caught out with one workflow working for one cube and not for another.

As for Map, there is a legacy of mixing the (x, y) and (y, x) APIs. While I personally find this limiting or even confusing, many users are used to it. Therefore, if we are going to support (x, y), I think it should be done on MapCube itself rather than NDCube, by continuing to support the old APIs like submap and superpixel in addition to the NDCube API. NDCube inheritance does not force us to get rid of the Map API, although I would argue that one of the several benefits of NDCube inheritance is that we CAN deprecate these.

That said, I would still advocate for MapCube making the transition to a full array-order API, and leave the attached WCS in wcs-order. While this is annoying, it does bring MapCube in-line with the rest of the sunpy/astropy data class ecosystem, and paves the way for functionalities on Map becoming transparently broadcastable to NDCube-based spectral-image cubes, time-image cubes, etc. While this appears to require a shift for users, Map already subtlely puts that requirement on users, as shown above. Moreover, fully embracing the array-order API opens a new era of interoperability between complex instruments like DKIST, and even FOXSI, which have 3-5 dimensions in their data. For whatever it's worth, I think this is the future of solar data analysis, and the old paradigm of separately manipulating single images with no other dimensions is already passing us by.

DanRyanIrish avatar Jun 30 '25 18:06 DanRyanIrish

For example, having to use submap instead of simply slicing the Map.

I'll note that the refusal to add slicing support in Map was partly due to wanting to encourage users to embrace coordinate-aware manipulations instead of thinking about the pixels themselves.

>>> arr = np.ones((501, 301)) * smap.unit

While you're not wrong that punching in the (x, y) dimensions by hand would lead to the shape mismatch, I'd argue that users should probably be discouraged from doing that anyway since that's a recipe for mistakes. Instead, they should abstractly pass along the required shape, e.g.:

arr = np.ones(smap.data.shape) * smap.unit
arr = np.ones_like(smap.data) * smap.unit
arr = np.ones(smap.wcs.array_shape) * smap.unit

It simply delays the lesson, allowing users to get comfortable with a conflicting mental map of how the data is stored, which they then have to unlearn and relearn for more advanced operations.

As far as Map is concerned, I'd be supportive of switching the data container to (x, y) ordering and thus be (x, y) everywhere. Because we largely discourage direct manipulation of .data, there isn't enough value in making such a change for it to be worth the trouble.

I'd argue that it's uncommon for Map users to have to think about array ordering, and can largely just think about pixel ordering. On the other hand, NDCube users have to be constantly aware of array ordering vs pixel ordering if they interact at all with the WCS object.

Moreover, fully embracing the array-order API opens a new era of interoperability between complex instruments like DKIST, and even FOXSI, which have 3-5 dimensions in their data. For whatever it's worth, I think this is the future of solar data analysis, and the old paradigm of separately manipulating single images with no other dimensions is already passing us by.

I feel these sentences are disingenuous. Nothing about multi-dimensional data analysis is fundamentally helped or hindered by the choice of indexing order. It's all about making a sensible implementation.


I see the insistence on row-major indexing in this ecosystem as a vestige of the C-based approach of allocating multi-dimensional arrays. NumPy arrays aren't even allocated that way. Languages built primarily for math use invariably have array ordering being the same as pixel ordering, partially due to lineage from Fortran, but also because it avoids a source of confusion for users. With NumPy views and strides, column-major indexing is merely a transpose away; there's no reason to be squeamish about it.

I envision a future where NDCube can make up for this past decision by providing the option now for the alternate decision. I see that as being better for users in the long run, if we do it right.

ayshih avatar Jul 01 '25 03:07 ayshih

I think I'm starting to understand the/a source of our difference in perspective. If I understand you right, you are thinking in terms of supporting (x, y) as being the easier for maintainers and users. I can see this view if coming predominantly from a Map perspective. To clarify this point, I see two slightly inconsistent motivations for this:

  • Maintain the Map API exactly as-in for backwards compatibility:
  • Make the Map API consistently (x,y) everywhere to remove inconsistencies e.g. submap is currently (x, y), and arithmetic operations are (y, x). The motivation for choosing (x, y) is so the WCS and arrays have consistent order.

If the former, this proposed change is not a solution as it will require a change to the current Map API, e.g. arithmetic operations will go from (y, x) to (x, y). Instead, we can leave NDCube as-is and reimplement the whole Map API. Then users can choose whether to interact with Map as (x, y) as now, or (y, x) through the inherited NDCube API, and be consistent with numpy, Spectrum1D, dkist user tools, etc. Personally, I see a shift complete shift of Map to the NDCube (y, x) paradigm (excluding the WCS or course) as bringing major advantages of interoperability between different data types. However, a decision on this beyond the scope of this issue and a question for the Map refactor.

If the latter, to my knowledge, Map is the only data class in the sunpy/astropy ecosystem that tries to give the appearance of (x, y) ordering. Therefore, I would argue that a full (x, y) paradigm (including storage of the underlying data array) should be implemented on Map first. If we find that this is successful and has applications throughout the sunpy/astropy ecosystem, we can always upstream this to NDCube at a later time without any friction for users.

Returning to the difference in our perspectives, from my predominantly NDCube-user perspective, I see supporting (x, y) as well as (y, x) as actually being more hassle from both maintenance and user points of view, not less. On the maintenance side, this obviously requires if statements and two versions of code snippets scattered throughout the codebase, makes development more error prone, and increases the required amount of testing. On the user side, I think there are fewer and fewer new sunpy users coming from IDL and learning Python for the first time through Map. Instead, I think people are more likely to first learn the (y, x) paradigm through numpy, which is such a foundational API to any scientific workflow. Therefore, in most cases, introducing the (x, y) paradigm actually gives users more to learn and track mentally (and hence making things more error prone). Moreover, as already stated, other data classes in the astropy/sunpy ecosystem already require users to understand that the difference between array and WCS ordering if they want to access the WCS directly. The only way to fully protect users from this knowledge would be to make NDCube only support (x, y). However, this still doesn't protect them from switching between paradigms because they then have to think in (y, x) as soon as they go back to numpy.

So in summary, I don't see the advantage of supporting (x, y) on NDCube because it doesn't prevent users having to think in terms of (y, x) if they use numpy arrays and other data classes (which they will). In this case, the benefits of interoperability of (y, x) are worth requiring users to learn this one rule about array and WCS ordering. However, if we do want the (x, y) paradigm for Map, I would argue that it be implemented on Map first as a trial, and then upstreamed to NDCube later if there is a demand beyond just Map.

Specific responses

While you're not wrong that punching in the (x, y) dimensions by hand would lead to the shape mismatch, I'd argue that users should probably be discouraged from doing that anyway since that's a recipe for mistakes.

I think this reveals an important drawback of the Map API.

As far as Map is concerned, I'd be supportive of switching the data container to (x, y) ordering and thus be (x, y) everywhere. Because we largely discourage direct manipulation of .data, there isn't enough value in making such a change for it to be worth the trouble.

As said above, this becomes a question of whether we want interoperability of Map with other data classes, or want to promote a walled garden where users don't stray beyond the confines of Map. Only then can they not have to worry about array and WCS ordering. I prefer the former, but this is a decision for the collective involved in the Map refactor.

I'd argue that it's uncommon for Map users to have to think about array ordering, and can largely just think about pixel ordering. On the other hand, NDCube users have to be constantly aware of array ordering vs pixel ordering if they interact at all with the WCS object.

Only if NDCube users want to access the WCS directly. And I think the majority of basic users of NDCube won't touch the WCS at all. If they want the coordinates of the pixel grid, they will use NDCube.axis_world_coords().

Direct access of the WCS aside, and with the exception of NDCube.crop, the NDCube API is always array order. I agree that array and wcs ordering being different is not ideal, but I also don't agree that Map users are currently fully protected from this either, e.g. arithmetic operations.

I feel these sentences are disingenuous. Nothing about multi-dimensional data analysis is fundamentally helped or hindered by the choice of indexing order. It's all about making a sensible implementation.

I didn't mean my point to be about the axis ordering in isolation. I agree with you that that would be disingenuous. My point was about the transparent interoperability between data classes in the sunpy/astropy ecosystem, and enabling workflows/functions to apply to Map and, e.g. SpectrumCube, or NDCube itself, without having to special case each different data class. Sorry if this wasn't clear.

I see the insistence on row-major indexing in this ecosystem as a vestige of the C-based approach of allocating multi-dimensional arrays. NumPy arrays aren't even allocated that way. Languages built primarily for math use invariably have array ordering being the same as pixel ordering, partially due to lineage from Fortran, but also because it avoids a source of confusion for users. With NumPy views and strides, column-major indexing is merely a transpose away; there's no reason to be squeamish about it.

I agree that we aren't discussing the underlying storage of the data in the memory. I think we agree that the key thing here is discussion of a dependable, intuitive API. Wrt Map I'm advocating that we embrace the (y, x) paradigm fully to enhance interoperability, simplifying workflows for users who use more data objects than just Map.

However, it's important to note that making Map inherit from NDCube does not force us to embrace this paradigm.

I envision a future where NDCube can make up for this past decision by providing the option now for the alternate decision. I see that as being better for users in the long run, if we do it right.

This is interesting. I will ponder this. My initial interpretation of your point is that NDCube can start to undo the foundational API decisions of Python (or at least numpy, dask etc.) which you see as a mistake. I fear this is too big a responsibility to put onto NDCube. However, as I said, I will ponder this further.

DanRyanIrish avatar Jul 02 '25 16:07 DanRyanIrish