array-api icon indicating copy to clipboard operation
array-api copied to clipboard

Broadcast behavior in `linalg.cross`

Open kgryte opened this issue 2 years ago • 8 comments

The purpose of this issue is to determine how, or if, broadcasting behavior should be specified for linalg.cross.

Overview

At the moment, linalg.cross is specified to only compute the cross product between two arrays x and y having the same shape. Meaning, the current linalg.cross behavior is specified to not support broadcasting.

The origin of the specified behavior comes from PyTorch (<= v1.10.x) and TensorFlow which required (at the time of specification) input arrays of the same shape. Not supporting broadcasting was thus the lowest common denominator across array libraries.

TensorFlow has not changed its behavior; PyTorch has.

NumPy, in its np.cross API, supports a limited form of broadcasting, as well as 2 element vectors. The broadcasting is limited as the computation dimension must have 2 or 3 elements and is not allowed to broadcast. E.g.,

In [1]: x = np.random.randn(4,3)                                                                                       

In [2]: y = np.random.randn(1,3)                                                                                      

In [3]: z = np.random.randn(4,1)                                                                                       

In [3]: np.cross(x,y)                                                                                                  
Out[3]: 
array([[ 0.04541174,  0.07097804, -0.53846464],
       [ 0.01457143, -0.74247471, -0.1795357 ],
       [ 2.4163107 , -0.72177202, -5.98228811],
       [ 1.14862403, -1.86202792,  0.32601926]])

In [4]: np.cross(x,z)
ValueError: incompatible dimensions for cross product
(dimension must be 2 or 3)

In [5]: np.add(x,z)                                                                                                    
Out[5]: 
array([[ 1.66159175,  0.92220278,  0.93708491],
       [ 0.26326781, -0.37688777,  1.26610177],
       [ 1.34535177,  1.13112439,  0.38157179],
       [-1.78861678, -1.34595513, -0.08110636]])

Even though x and z are broadcast compatible, np.cross raises an exception.

Starting in PyTorch v1.11.0, linalg.cross supports broadcasting. From the docs,

Also supports batches of vectors, for which it computes the product along the dimension dim. In this case, the output has the same batch dimensions as the inputs broadcast to a common shape.

If after broadcasting input.size(dim) != 3 or other.size(dim) != 3.

Thus implying full broadcast support.

Questions

  1. Should the spec add support for broadcasting to linalg.cross? If so, full broadcasting or NumPy-style (i.e., must have matching dimension sizes, similar to how vecdot works)?
  2. Should the spec add support for size 2 vectors similar to NumPy?

kgryte avatar Apr 07 '22 07:04 kgryte

Hmm, for other functions where NumPy and other libraries did not support the regular broadcasting or batch dimensions, we opted for consistency in the design. And worked to add features to the libraries (I remember numpy.linalg.qr for example needed extending). Is there a reason not to add full broadcasting support other than library support?

rgommers avatar Apr 07 '22 15:04 rgommers

This indeed does not raise for PyTorch:

In [1]: import torch

In [2]: x = torch.randn(4,3)

In [3]: y = torch.randn(1,3)

In [4]: z = torch.randn(4,1)

In [5]: torch.cross(x,y)
Out[5]: 
tensor([[-1.1252, -2.0054, -0.0414],
        [ 1.3710,  4.7376, -0.3430],
        [-1.0305, -0.8099, -0.2140],
        [ 1.4742, -1.7227,  0.8003]])

In [6]: torch.cross(x,z)
Out[6]: 
tensor([[ 1.1523, -0.2521, -0.9002],
        [ 0.0178, -3.8786,  3.8608],
        [ 0.1672, -1.3077,  1.1405],
        [ 1.0044, -1.3618,  0.3574]])

I'm not sure if that is on purpose, given that these two fragments from the PyTorch docs give a slightly contradictory message:

  • In this case, the output has the same batch dimensions as the inputs broadcast to a common shape.
  • If after broadcasting input.size(dim) != 3 or other.size(dim) != 3.

I'd expect the batch dimensions to broadcast, but I agree it's odd for the size-3 vector dimension to broadcast. This was discussed on https://github.com/pytorch/pytorch/pull/63285, but I see @mruberry expressed concern about broadcasting and that discussion wasn't resolved as far as I can tell. So it may have been overlooked in the final merge. @Lezcano @AnirudhDagar any thoughts here?

rgommers avatar Apr 07 '22 20:04 rgommers

As discussed internally:

I suggested that linalg.cross supports broadcasting because in most (all?) linalg functions in PyTorch we allow broadcasting (e.g. in the solvers).

Now, I don't feel strongly about this, so either behaviour seems fine to me.

lezcano avatar Apr 07 '22 20:04 lezcano

Re (2): https://github.com/numpy/numpy/issues/13718 addresses why adding length-2 vector support to cross is a poor idea, a JIT compiler not being able to implement that behavior is a good enough reason to leave it out.

Re (1): thanks @Lezcano for the context. I see no requests on the NumPy issue tracker for that kind of broadcasting. And it's conceptually odd as @asmeurer argued yesterday (I can't reproduce his eloquent argument exactly, but these are not just independent numbers, and size-1 broadcasting would mean all vectors point in the same direction in 3-D space ((1,1,1)). So +1 for leaving it out. Guess it's not a clear bug, just a slight oddity, so I'm happy to not follow up further on the PyTorch side and leave that as it is now.

rgommers avatar Apr 08 '22 15:04 rgommers

Re; broadcasting. My sense is that it would be best for PyTorch to consider its current behavior a "bug" and align with NumPy et al in terms of broadcasting. As discussed in the meeting, vecdot is another linear algebra function in the spec which broadcasts only in the batch dimensions, meaning there's already existing precedent in the standard and in PyTorch for this sort of broadcasting behavior.

Better to align behavior now than to need to revisit in the future. 1.11.0 is recent enough that any downstream expectation of broadcasting behavior is negligible.

And by aligning, we can move forward with documenting broadcasting behavior in the specification now so that future array implementors can be aligned going forth.

Re: length 2. Agreed. This should be excluded from the spec. And as such, NumPy's behavior would need to be considered backward incompatible within the context of the NumPy-Array API compatibility document.

kgryte avatar Apr 08 '22 19:04 kgryte

Re: length 2. Agreed. This should be excluded from the spec. And as such, NumPy's behavior would need to be considered backward incompatible within the context of the NumPy-Array API compatibility document.

Makes sense.

rgommers avatar Apr 08 '22 19:04 rgommers

My argument went something like this. Other linalg functions like matmul and tensordot do not broadcast in the contracted dimensions. cross doesn't do contraction, but it's a similar thing in the sense that the arrays combine with each other in a way that isn't just element-by-corresponding-element. If you have a (4, 3) array cross a (1, 3) array, what that means is that the first array is a stack of four 3-vectors, and the second is a single 3-vector. Broadcasting implicitly treats the latter as a stack of the same 3-vector four times, so that it can be applied to the first. On the other hand, in the simplest case, if you were to take a (1,) array and cross with a (3,) array, if these were to broadcast, this would mean treating the single element array as a 3-vector with the same number for x, y, and z (e.g., [2] x [3, 4, 5] would mean [2, 2, 2] x [3, 4, 5]). This is morally quite a different thing from broadcasting the "stacking" dimensions. Personally when I think of broadcasting, I think of it in the stacking sense. An array is a stack of smaller arrays in each dimension, and broadcasting lets you implicitly treat a single array "stack" (in a dimension) as repeated as many times as necessary. It's a generalization of combining a single scalar with an array by repeating the scalar across every element of the array. You could always manually stack to make the dimensions match, but it's inefficient to repeat the data.

But if you were to broadcast contracted or crossed dimensions, this is no longer simplified stacking. Instead you would be say saying that, e.g., an n x 1 column vector is the same thing as any n x k matrix with that column repeated k times, where k is whatever happens to match up with what you are multiplying it with, or you would be saying that a single number a is the same as the 3-vector <a, a, a>. You'd be treating a column vector as something that can be stacked into a matrix or a scalar as something that can be stacked into a vector. But it's rare to have a matrix with a column that is repeated, or a vector that is just a multiple of <1, 1, 1>. A much more likely scenario is that this represents a mistake by the user.

That's why I suggested at the meeting that this behavior is likely to be intentional in NumPy, although we can't know that for sure unless we dig into the original NumPy discussions, or ask people who might remember. I do know that the matmul case, which I see as being very similar, appears to be very intentional and is even mentioned in PEP 465: "For inputs with more than 2 dimensions, we treat the last two dimensions as being the dimensions of the matrices to multiply, and ‘broadcast’ across the other dimensions" (emphasis mine).

In fact, we had a very similar discussion about tensordot broadcasting, which pytorch also broadcasts in the contracted dimensions, but we decided for the spec to not (similar to numpy). See https://github.com/data-apis/array-api/issues/294.

asmeurer avatar Apr 08 '22 20:04 asmeurer

Corresponded with @Lezcano and the plan is for PyTorch to patch its current broadcast behavior in linalg.cross to match NumPy's cross behavior. Namely,

  • the dimension over which cross product is computed must be equal to 3 and must be the same size for both x1 and x2 before broadcasting (i.e., the compute dimension does not broadcast).
  • the non-broadcast dimensions must be broadcast compatible.

This aligns linalg.cross broadcast behavior with linalg.vecdot.

Accordingly, given array library alignment (sans TensorFlow), we can move to update the specification accordingly wrt broadcasting for linalg.cross.

kgryte avatar Apr 18 '22 06:04 kgryte

For what is worth, now PyTorch implements this behaviour. See https://github.com/pytorch/pytorch/pull/83798

lezcano avatar Aug 26 '22 17:08 lezcano

This was addressed in https://github.com/data-apis/array-api/pull/417. Closing.

kgryte avatar Sep 19 '22 07:09 kgryte