array-api icon indicating copy to clipboard operation
array-api copied to clipboard

Preferences regarding "core dimension"

Open 34j opened this issue 4 months ago • 3 comments

I always wonder how to design a vectorizable function when the sizes of the input and output arrays are different. In other words, I am always unsure whether to shift the “core dimension” to the back

def polar_coordinates(r, theta):
    xp = array_api_compat(r, theta)
    return xp.stack([r * xp.cos(theta), r * xp.sin(theta)], axis=-1)

or to the front

def polar_coordinates(r, theta):
    xp = array_api_compat(r, theta)
    return xp.stack([r * xp.cos(theta), r * xp.sin(theta)], axis=0)

Is there any plans to add recommendations for this to array API? For reference

  • Numpy's gufunc puts "core dimension" to the back https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html

  • Recently added scipy.special.sph_harm_y_all puts "core dimension" to the front https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.sph_harm_y_all.html

  • Since broadcasting adjusts shapes to the back, by putting "core dimension" to the back one can more easily interact with newly added feature (dimension), and vice versa.

  • In terms of calculation speed, this should be related to C-style and Fortran-style indexing.

34j avatar Aug 22 '25 05:08 34j

Is there any plans to add recommendations for this to array API?

I don't think so. Given that we're rarely designing new APIs and usually adopting what is already common across array libraries, we haven't had to give this one any thought.

Both broadcasting and batching (linalg APIs, deep learning operators) grow new dimensions at the front, so that seems to be the prevalent pattern - gufuncs are niche in comparison. However, as long as you document it well, either choice seems reasonable and has precedent.

rgommers avatar Aug 22 '25 07:08 rgommers

I see, I'm starting to feel like either would be fine. Regarding the speed, considering that many libraries use C-style indexing, is it okay to say that ordinary vectorizable array API compatible code would be faster if it grows "new dimensions at the front"?

34j avatar Aug 22 '25 08:08 34j

As a data point, scipy.interpolate typically adds batch dimensions to the end. This did cause some mild inconveniences, mostly because it's not what usual broadcasting does.

ev-br avatar Sep 15 '25 07:09 ev-br