array-api icon indicating copy to clipboard operation
array-api copied to clipboard

RFC: add `copy`

Open jakirkham opened this issue 2 years ago • 5 comments

Some array libraries implement a .copy() method (like NumPy). While there are some indirect ways to get at this now (asarray, reshape, etc.), currently the API lacks a way to do this directly and without other potential side-effects. Should add a method (unlike a function) would ensure the new array has the original array's type simply. Curious if there is appetite for including this in the API.

Related is a question of what interplay there is with __copy__ and/or __deepcopy__ (if any).

jakirkham avatar Oct 12 '22 01:10 jakirkham

Thanks @jakirkham. A couple of thoughts:

  • NumPy has both a function and a method. The function has an extra keyword (subok)
  • Either way, the new keywords don't seem appropriate, so it would simply be copy(x) or x.copy()
  • At that point, it's the same as copy.copy and copy.deepcopy I"d think (at least for NumPy)
  • PyTorch calls it clone, it doesn't have copy
  • Is there a reason to add it? Autograd or compilers related perhaps - better than the stdlib function?

rgommers avatar Oct 12 '22 17:10 rgommers

Isn't this asarray(x, copy=True)? Does that have side-effects?

asmeurer avatar Oct 12 '22 20:10 asmeurer

To summarize the ask, library functions handling general arrays may want to copy as they want an array they can mutate safely without affecting user provided data.

We concluded that we want a function (not a method) and one needs to do a namespace lookup (x.__array_namespace__) to figure it out (as the type is likely not known by the library).

There is a separate question of what we call it. Either a new function (copy, clone) or use an existing one (asarray(x, copy=True)). We would also want to spell out that some libraries (Dask, JAX, maybe others) may not actually copy the underlying data.

__copy__ and __deepcopy__ are different enough (Dask would copy graphs, JAX something similar, etc.) that it is worth specifying this path may not copy the array data (if that is the user's concern) and that using the function above (name to be decided) would be preferred for data copying.

jakirkham avatar Oct 20 '22 17:10 jakirkham

Right now asarray says that copy=True MUST copy the data, but maybe it should say it can not copy if it knows it solely owns the data and disallows mutation. Or can there be situations where a library thinks it solely owns the data but it doesn't actually, so it really has to do a real memory copy?

asmeurer avatar Oct 20 '22 21:10 asmeurer

@asmeurer good point. Maybe something like "must ensure that the returned array does not share data with another array, either by copying the data to a new memory location or in some other way (e.g., this property is guaranteed by design)".

rgommers avatar Nov 04 '22 21:11 rgommers