array-api
array-api copied to clipboard
RFC: add `copy`
Some array libraries implement a .copy()
method (like NumPy). While there are some indirect ways to get at this now (asarray
, reshape
, etc.), currently the API lacks a way to do this directly and without other potential side-effects. Should add a method (unlike a function) would ensure the new array has the original array's type simply. Curious if there is appetite for including this in the API.
Related is a question of what interplay there is with __copy__
and/or __deepcopy__
(if any).
Thanks @jakirkham. A couple of thoughts:
- NumPy has both a function and a method. The function has an extra keyword (
subok
) - Either way, the new keywords don't seem appropriate, so it would simply be
copy(x)
orx.copy()
- At that point, it's the same as
copy.copy
andcopy.deepcopy
I"d think (at least for NumPy) - PyTorch calls it clone, it doesn't have
copy
- Is there a reason to add it? Autograd or compilers related perhaps - better than the stdlib function?
Isn't this asarray(x, copy=True)
? Does that have side-effects?
To summarize the ask, library functions handling general arrays may want to copy as they want an array they can mutate safely without affecting user provided data.
We concluded that we want a function (not a method) and one needs to do a namespace lookup (x.__array_namespace__
) to figure it out (as the type is likely not known by the library).
There is a separate question of what we call it. Either a new function (copy
, clone
) or use an existing one (asarray(x, copy=True)
). We would also want to spell out that some libraries (Dask, JAX, maybe others) may not actually copy the underlying data.
__copy__
and __deepcopy__
are different enough (Dask would copy graphs, JAX something similar, etc.) that it is worth specifying this path may not copy the array data (if that is the user's concern) and that using the function above (name to be decided) would be preferred for data copying.
Right now asarray
says that copy=True
MUST copy the data, but maybe it should say it can not copy if it knows it solely owns the data and disallows mutation. Or can there be situations where a library thinks it solely owns the data but it doesn't actually, so it really has to do a real memory copy?
@asmeurer good point. Maybe something like "must ensure that the returned array does not share data with another array, either by copying the data to a new memory location or in some other way (e.g., this property is guaranteed by design)".