array-api RFC: update `vecdot` to accept `keepdims` and `axis` to support tuples

I'm working on a PR for SciPy that replaces the sum(x * y) pattern with vecdot(x, y), since there are pretty substantial performance advantages. There is at least one place where the code currently relies on the keepdims argument of sum. Should that be accepted by vecdot?

How about axis tuples? I understand this does not look ideal, and I would also be happy if axis tuples were never allowed. But the motivation is the same - there are existing functions that use sum(x * y) and support axis tuples. If vecdot doesn't allow them, these functions all have to manually ravel the axes of x and y separately. (This operation can be moved into a helper function if need be, but hopefully it need not be.)

Mar 02 '25 17:03 mdhaber

@mdhaber Your keepdims argument would also apply more generally to other contractions (e.g., tensordot).

For multiple axis, could you not do a combination of permuting dims and/or reshape so that your computation axis is a flattened dimension? For arrays supporting views, this could be workable; although, reshape may force a copy.

However, one thing here is the single-axis restriction may find its origins from BLAS, where acceleration is achieved via calls to ddot, sdot, and friends. If libraries implement vecdot by calling into BLAS, in order to support multiple axes, they'd likely need to copy values to a contiguous buffer.

Apr 03 '25 08:04 kgryte

For multiple axis, could you not do a combination of permuting dims and/or reshape so that your computation axis is a flattened dimension

Of course. For NumPy at least, yes, always. Often we do this with a decorator, even. This particular code was not handled like that already, though.

And for keepdims, we can keep track of the axes in which they started and be sure to add them all back. A loop may be required if axis is a tuple because the standard only supports an int for axis, but it can be done. We often do this with a decorator, too.

So it can all be done. But was it intentional to leave these out of the standard? The features are included for other reducing functions[^1], so I thought I'd ask about vecdot. (It only came up in vecdot for me, but yes, this question could be extended to functions.)

... in order to support multiple axes, they'd likely need to copy values to a contiguous buffer.

Sure, copies may be required if the data is not already oriented like BLAS needs it to be. Users are not required to use multiple axes, and they can be informed of the performance consequences if they choose to do so. The same goes for other reducing functions, too, right?

[^1]: I understand why axis can only be negative for vecdot, BTW. I'm ok working around that because the behavior for positive axes would be less obvious when the arrays don't have the same number of dimensions.

Apr 03 '25 14:04 mdhaber

This issue was discussed in the most recent public workgroup meeting (24/7/2025). Consensus there was that this proposal should not move forward. There were a few reasons:

The machinery in NumPy is not well-suited for supporting multiple axes in vecdot as would need to copy to contiguous memory. And there is not likely to be much appetite in adding support.
We'd likely face an uphill battle getting this upstreamed in PyTorch.
A more general solution, which isn't likely to get the hardware accelerated perf from BLAS and friends, would be to use einsum for computing over multiple axes.

I'll go ahead and close this one out.

Aug 18 '25 06:08 kgryte