numpy icon indicating copy to clipboard operation
numpy copied to clipboard

ENH: Add a "broadcast" option to numpy.concatenate

Open carlosgmartin opened this issue 8 months ago • 5 comments

Proposed new feature or change:

Mailing list post

Often, I've wanted to concatenate arrays with different ndims along a particular axis, broadcasting the other axes as needed. Others have sought this functionality as well:

  • https://github.com/numpy/numpy/issues/2115
  • https://stackoverflow.com/questions/56357047/concatenate-with-broadcast
  • https://stackoverflow.com/questions/52733240/concatenate-1d-array-to-a-3d-array
  • https://stackoverflow.com/questions/46700081/concat-two-arrays-of-different-dimensions-numpy
  • https://stackoverflow.com/questions/55879664/how-to-concatenate-a-2d-array-into-every-3d-array
  • https://stackoverflow.com/questions/63453495/combining-two-numpy-arrays

However, numpy.concatenate raises an error if the ndims don't match. For example:

import numpy as np

a = np.full([2], 0.1)
b = np.full([3, 2], 0.2)
c = np.full([5, 3, 2], 0.3)

arrays = [a, b, c]
axis = -1

try:
    np.concatenate(arrays, axis)
except ValueError as e:
    print(repr(e))
ValueError('all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)')

It would be convenient to add to numpy.concatenate an optional boolean argument called broadcast that broadcasts the input arrays along the axes that are not the concatenation axis, before concatenating them. Its default value can be False, which is the current behavior.

Below is an example implementation:

def tuple_replace(tupl, index, item):
    return tupl[:index] + (item,) + tupl[index:][1:]

def broadcast_concat(arrays, axis):
    shape = np.broadcast_shapes(*(tuple_replace(a.shape, axis, 0) for a in arrays))
    bcast_arrays = [
        np.broadcast_to(a, tuple_replace(shape, axis, a.shape[axis])) for a in arrays
    ]
    return np.concatenate(bcast_arrays, axis)

output = broadcast_concat(arrays, axis)
assert output.shape[axis] == sum(a.shape[axis] for a in arrays)

If desired, I can submit a PR for this.

carlosgmartin avatar Mar 18 '25 02:03 carlosgmartin

I like this idea. Note that this would have to be done in C, and a bit of thought is needed on how/whether the argument should also be available for functions that use concatenate, like *stack, block, etc. (and possibly unstack should have an "unbroadcast" argument).

mhvk avatar Mar 18 '25 13:03 mhvk

Note: an alternative would be to add an exclude_axis argument to broadcast_arrays.

Also note that axis should always be negative so it counts from the back.

mhvk avatar Mar 18 '25 13:03 mhvk

@mhvk Good point. That would be more general.

We could also allow multiple axes to be excluded.

carlosgmartin avatar Mar 18 '25 18:03 carlosgmartin

I don't mind it, although I don't love that the concate signature would get very long, which does make a broadcast_arrays approach also appealing (but less obvious, but I am not sure that this type of concatenation is common?).

The fact that concatenate is in C doesn't change: the concatenate code should already support it, you just have to skip raising that specific error.

Also note that axis should always be negative so it counts from the back.

Yeah good point, you could allow positive, but only if all dimensions match (not sure that is useful).

seberg avatar Mar 18 '25 18:03 seberg

Came here from a recent discussion in the Array API community meeting, https://github.com/data-apis/array-api/issues/946

The original motivation of the array API discussion was a desire to replace uses of np._r: np.r_[0, x] and np.r_[x[0], x, x[-1]]. These constructs appear some ~150 times in SciPy, and I believe the vast majority of cases have x a 1D array, which gets extended by a scalar or two. Currently, replacing these kinds of things requires an explicit atleast_1d, which is both ugly and easy to miss (we broke the Jax CI at least once). From this POV, having concat handle scalars and 0D arrays would be a very nice change.

Signature-wise, doesconcat really need an extra keyword? It already has all the information to convert scalars into arrays of the proper shape--- this would of course complicate the implementation, as it'll need to make a separate pass across its arguments to figure out the shape of the result. And it'll make it slower but concat is unlikely to be a performance bottleneck anywhere, I'd think.

API-wise, a minimal great change would be if concat required its arguments to be either arrays of the same shape or scalars or 0D arrays (which are internally broadcast to the array shape). As a more expansive change, it could follow the np.diff-like API, where append and prepend need to be the same (or broadcastable?) in all dimensions apart from the concatenation axis. That would make the result shape data-dependent though.

And FWIW, it would be perfectly fine if this is confined to np.concat / np.concatenate only, while stack functions continue to require exactly matching shapes.

ev-br avatar May 31 '25 17:05 ev-br