RFC: allow scalars and 0D arrays in `concat`
The following patterns are quite common [1]: x = np.r_(x[0], x, x[-1]) and x = np.r_[0, x]. Neither of these can be directly replaced by xp.concat because the latter requires that The arrays must have the same shape, except in the dimension specified by axis.
The most common case IME is that x is a 1D array, which gets appended or prepended by a scalar.
An Array API replacement is something along the lines of
def npr(xp, *arys):
arys = [xp.asarray(a) for a in arys]
arys = [xpx.atleast_nd(a, ndim=1, xp=xp) for a in arys]
return xp.concat(arys)
which requires array_api_extra and is generally a bit clunky. There was at least one case where a scipy change which was missing atleast_1d broke jax.scipy.
Allowing 0D arrays and python scalars in concat would obviate the need for these sorts of helpers.
[1] At least in scipy,
$ git grep "np.r_" |wc -l
169
Why does this function not allow broadcasting? It seems like a more general solution isn't it?
The current guidance originates from NumPy (see https://numpy.org/doc/1.26/reference/generated/numpy.concatenate.html).
The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
Yes, numpy's concatenate is limited by what is in this guidance. Numpy, however, has np.r_, which --- with all is sins --- allows extending arrays with scalars or 0D arrays. And this is what's missing in the array API land.
I think np.concat is not limited by this guidance, it was always limited (check np.concatenate in older versions). For NumPy, it's hstack that does the right thing here (perhaps by accident).
>>> import numpy as np
>>> x = np.arange(5)
>>> np.hstack((x[0], x, x[-1]))
array([0, 0, 1, 2, 3, 4, 4])
>>> # concat is more fiddly:
>>> np.concat((x[0], x, x[-1]))
...
ValueError: zero-dimensional arrays cannot be concatenated
>>> np.concat((np.expand_dims(x[0], 0), x, x[-1]))
...
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 2 has 0 dimension(s)
>>> np.concat((np.expand_dims(x[0], 0), x, np.expand_dims(x[-1], 0)))
array([0, 0, 1, 2, 3, 4, 4])
I'd say just go with the last line here. More verbose, but definitely an improvement over r_ (all the *_ constructors are unreadable).
This led me to rediscover gh-494, may be worth revisiting perhaps.
np.concat is not limited by this guidance, it was always limited (check np.concatenate in older versions)
This is exactly what I am saying: np.concatenate was always limited, and np.r_ was a way around the limitation.
definitely an improvement over r_ (all the *_ constructors are unreadable).
TBH, I fail to see how xp.concat((xp.expand_dims(x[0], 0), x)) is more readable than np.r_[x[0], x]. And it does not of course work for np.r_[0, x], which needs something like xp.concat((xp.zeros_like(x), x)) instead.
gh-494, may be worth revisiting perhaps.
This is almost xpx.atleast_nd.
TBH, I fail to see how
xp.concat((xp.expand_dims(x[0], 0), x))is more readable
Perhaps write a little helper function for SciPy then?
def concat_1d(*arrays, *, xp):
"""Like `concat`, except (a) for 1-D only, and (b) also accepts scalars and 0-D arrays"""
Then you can write it as concat_1d(x[0], x, x[-1], xp=xp), which is about as good as it gets until there's a function in the standard that does this.
Exactly. I've a scipy helper, and this issue is to gauge interest/possibility to make it work with xp.concat (or xp.stack) and drop the helper.
FWIW, I have no strong opinion either way. Although, if you allow 0-D the question is why not allow any broadcasting (except along the concatenated dimension) and it may be nice to have a NumPy PR to see what others think. I could also see to allow optional broadcasting.
What has come up in NumPy before (I think Matt Haberland for example liked to have it), is a broadcast_arrays(*arrs, omit_axis=...). Doesn't quite make this particular use-case nice, but has some overlap (and makes writing the helper clean for N-D inputs).
Are there any upstream numpy issues about adding 0-D or broadcasting support to np.concatenate?
https://github.com/numpy/numpy/issues/28549 but I am not aware of old discussions (which coesn't mean they don't exist).
The stackoverflow references on that issue certainly seem to imply that full broadcasting would be useful, not just support for 0-D concatenation.