array-api
array-api copied to clipboard
Broadcast behavior in `linalg.cross`
The purpose of this issue is to determine how, or if, broadcasting behavior should be specified for linalg.cross
.
Overview
At the moment, linalg.cross
is specified to only compute the cross product between two arrays x
and y
having the same shape. Meaning, the current linalg.cross
behavior is specified to not support broadcasting.
The origin of the specified behavior comes from PyTorch (<= v1.10.x) and TensorFlow which required (at the time of specification) input arrays of the same shape. Not supporting broadcasting was thus the lowest common denominator across array libraries.
TensorFlow has not changed its behavior; PyTorch has.
NumPy, in its np.cross
API, supports a limited form of broadcasting, as well as 2
element vectors. The broadcasting is limited as the computation dimension must have 2
or 3
elements and is not allowed to broadcast. E.g.,
In [1]: x = np.random.randn(4,3)
In [2]: y = np.random.randn(1,3)
In [3]: z = np.random.randn(4,1)
In [3]: np.cross(x,y)
Out[3]:
array([[ 0.04541174, 0.07097804, -0.53846464],
[ 0.01457143, -0.74247471, -0.1795357 ],
[ 2.4163107 , -0.72177202, -5.98228811],
[ 1.14862403, -1.86202792, 0.32601926]])
In [4]: np.cross(x,z)
ValueError: incompatible dimensions for cross product
(dimension must be 2 or 3)
In [5]: np.add(x,z)
Out[5]:
array([[ 1.66159175, 0.92220278, 0.93708491],
[ 0.26326781, -0.37688777, 1.26610177],
[ 1.34535177, 1.13112439, 0.38157179],
[-1.78861678, -1.34595513, -0.08110636]])
Even though x
and z
are broadcast compatible, np.cross
raises an exception.
Starting in PyTorch v1.11.0, linalg.cross
supports broadcasting. From the docs,
Also supports batches of vectors, for which it computes the product along the dimension dim. In this case, the output has the same batch dimensions as the inputs broadcast to a common shape.
If after broadcasting input.size(dim) != 3 or other.size(dim) != 3.
Thus implying full broadcast support.
Questions
- Should the spec add support for broadcasting to
linalg.cross
? If so, full broadcasting or NumPy-style (i.e., must have matching dimension sizes, similar to howvecdot
works)? - Should the spec add support for size
2
vectors similar to NumPy?
Hmm, for other functions where NumPy and other libraries did not support the regular broadcasting or batch dimensions, we opted for consistency in the design. And worked to add features to the libraries (I remember numpy.linalg.qr
for example needed extending). Is there a reason not to add full broadcasting support other than library support?
This indeed does not raise for PyTorch:
In [1]: import torch
In [2]: x = torch.randn(4,3)
In [3]: y = torch.randn(1,3)
In [4]: z = torch.randn(4,1)
In [5]: torch.cross(x,y)
Out[5]:
tensor([[-1.1252, -2.0054, -0.0414],
[ 1.3710, 4.7376, -0.3430],
[-1.0305, -0.8099, -0.2140],
[ 1.4742, -1.7227, 0.8003]])
In [6]: torch.cross(x,z)
Out[6]:
tensor([[ 1.1523, -0.2521, -0.9002],
[ 0.0178, -3.8786, 3.8608],
[ 0.1672, -1.3077, 1.1405],
[ 1.0044, -1.3618, 0.3574]])
I'm not sure if that is on purpose, given that these two fragments from the PyTorch docs give a slightly contradictory message:
- In this case, the output has the same batch dimensions as the inputs broadcast to a common shape.
-
If after broadcasting
input.size(dim) != 3
orother.size(dim) != 3
.
I'd expect the batch dimensions to broadcast, but I agree it's odd for the size-3 vector dimension to broadcast. This was discussed on https://github.com/pytorch/pytorch/pull/63285, but I see @mruberry expressed concern about broadcasting and that discussion wasn't resolved as far as I can tell. So it may have been overlooked in the final merge. @Lezcano @AnirudhDagar any thoughts here?
As discussed internally:
I suggested that
linalg.cross
supports broadcasting because in most (all?) linalg functions in PyTorch we allow broadcasting (e.g. in the solvers).
Now, I don't feel strongly about this, so either behaviour seems fine to me.
Re (2): https://github.com/numpy/numpy/issues/13718 addresses why adding length-2 vector support to cross
is a poor idea, a JIT compiler not being able to implement that behavior is a good enough reason to leave it out.
Re (1): thanks @Lezcano for the context.
I see no requests on the NumPy issue tracker for that kind of broadcasting. And it's conceptually odd as @asmeurer argued yesterday (I can't reproduce his eloquent argument exactly, but these are not just independent numbers, and size-1 broadcasting would mean all vectors point in the same direction in 3-D space ((1,1,1)
). So +1 for leaving it out. Guess it's not a clear bug, just a slight oddity, so I'm happy to not follow up further on the PyTorch side and leave that as it is now.
Re; broadcasting. My sense is that it would be best for PyTorch to consider its current behavior a "bug" and align with NumPy et al in terms of broadcasting. As discussed in the meeting, vecdot
is another linear algebra function in the spec which broadcasts only in the batch dimensions, meaning there's already existing precedent in the standard and in PyTorch for this sort of broadcasting behavior.
Better to align behavior now than to need to revisit in the future. 1.11.0
is recent enough that any downstream expectation of broadcasting behavior is negligible.
And by aligning, we can move forward with documenting broadcasting behavior in the specification now so that future array implementors can be aligned going forth.
Re: length 2. Agreed. This should be excluded from the spec. And as such, NumPy's behavior would need to be considered backward incompatible within the context of the NumPy-Array API compatibility document.
Re: length 2. Agreed. This should be excluded from the spec. And as such, NumPy's behavior would need to be considered backward incompatible within the context of the NumPy-Array API compatibility document.
Makes sense.
My argument went something like this. Other linalg functions like matmul
and tensordot
do not broadcast in the contracted dimensions. cross
doesn't do contraction, but it's a similar thing in the sense that the arrays combine with each other in a way that isn't just element-by-corresponding-element. If you have a (4, 3) array cross a (1, 3) array, what that means is that the first array is a stack of four 3-vectors, and the second is a single 3-vector. Broadcasting implicitly treats the latter as a stack of the same 3-vector four times, so that it can be applied to the first. On the other hand, in the simplest case, if you were to take a (1,) array and cross with a (3,) array, if these were to broadcast, this would mean treating the single element array as a 3-vector with the same number for x, y, and z (e.g., [2] x [3, 4, 5] would mean [2, 2, 2] x [3, 4, 5]). This is morally quite a different thing from broadcasting the "stacking" dimensions. Personally when I think of broadcasting, I think of it in the stacking sense. An array is a stack of smaller arrays in each dimension, and broadcasting lets you implicitly treat a single array "stack" (in a dimension) as repeated as many times as necessary. It's a generalization of combining a single scalar with an array by repeating the scalar across every element of the array. You could always manually stack to make the dimensions match, but it's inefficient to repeat the data.
But if you were to broadcast contracted or crossed dimensions, this is no longer simplified stacking. Instead you would be say saying that, e.g., an n x 1
column vector is the same thing as any n x k
matrix with that column repeated k
times, where k
is whatever happens to match up with what you are multiplying it with, or you would be saying that a single number a
is the same as the 3-vector <a, a, a>
. You'd be treating a column vector as something that can be stacked into a matrix or a scalar as something that can be stacked into a vector. But it's rare to have a matrix with a column that is repeated, or a vector that is just a multiple of <1, 1, 1>
. A much more likely scenario is that this represents a mistake by the user.
That's why I suggested at the meeting that this behavior is likely to be intentional in NumPy, although we can't know that for sure unless we dig into the original NumPy discussions, or ask people who might remember. I do know that the matmul case, which I see as being very similar, appears to be very intentional and is even mentioned in PEP 465: "For inputs with more than 2 dimensions, we treat the last two dimensions as being the dimensions of the matrices to multiply, and ‘broadcast’ across the other dimensions" (emphasis mine).
In fact, we had a very similar discussion about tensordot
broadcasting, which pytorch also broadcasts in the contracted dimensions, but we decided for the spec to not (similar to numpy). See https://github.com/data-apis/array-api/issues/294.
Corresponded with @Lezcano and the plan is for PyTorch to patch its current broadcast behavior in linalg.cross
to match NumPy's cross
behavior. Namely,
- the dimension over which cross product is computed must be equal to
3
and must be the same size for bothx1
andx2
before broadcasting (i.e., the compute dimension does not broadcast). - the non-broadcast dimensions must be broadcast compatible.
This aligns linalg.cross
broadcast behavior with linalg.vecdot
.
Accordingly, given array library alignment (sans TensorFlow), we can move to update the specification accordingly wrt broadcasting for linalg.cross
.
For what is worth, now PyTorch implements this behaviour. See https://github.com/pytorch/pytorch/pull/83798
This was addressed in https://github.com/data-apis/array-api/pull/417. Closing.