array-api icon indicating copy to clipboard operation
array-api copied to clipboard

API: `asarray` API for `copy=...` kwarg

Open seberg opened this issue 2 years ago • 7 comments

There was a plan to update NumPy's asarray's copy kwarg, however, this would mismatch with the array one. NumPy was currently aiming for possibly having an enum or maybe copy="never", but not a False/True/None flags.

I am a bit unclear about the desired API right now, since on NumPy it was explicitly argued against the True/False/None choice.

@charris has also brought up the idea to add a "please do not copy this array" flag to the array object itself to prevent accidental copy of large arrays.

seberg avatar Jul 20 '22 18:07 seberg

@shoyer do you have an opion here? IIRC you were one of those arguing the most against a 3-way True/False/None switch.

As to Chucks proposal, the question here is how we expect users to use it, I am not sure if it makes much sense for np.reshape(arr, copy=False) (flagging arr ahead of time). OTOH, that guarenteed view is not very interesting from an array-API perspective probably (in NumPy it has a use to know its a view, but the array-API does not formalize the view concept.)

seberg avatar Jul 21 '22 17:07 seberg

I think 3-way True/False/None is a totally reasonable solution if we are designing this API from scratch.

The problem is that right now copy=False means "copy if needed" for NumPy, which is not what we would want copy=False to mean in the array API.

If we want to change what copy=False means, we'll need a deprecation cycle in NumPy. That would be somewhat distruptive, but on the whole, it might be a good thing -- it would turn up lots of cases where users are probably mistaken about what copy=False means. Also this would a relatively smooth deprecation, because there's an easy backwards compatible work-around (just set copy=None instead) and we would not be changing the behavior of existing code beyond starting to raise an error message in some cases.

shoyer avatar Jul 21 '22 18:07 shoyer

As a tangent (sorry, I know this is not what this thread was intended for, but it fits the title's scope...), I don't recall why we explicitly asked to raise ValueError here:

If False, the function must never copy for input which supports the buffer protocol and must raise a ValueError in case a copy would be necessary.

But IIRC we don't usually force library providers onto specific error types in the standard (@kgryte reminded me about this before), so perhaps we should just say "raise an error" here? (I'd further argue that ValueError should be replaced by RuntimeError or something else, because a runtime check must be done here; copy=False is a valid input.)

leofang avatar Jul 21 '22 18:07 leofang

it would turn up lots of cases where users are probably mistaken about what copy=False means.

Couldn't agree more. I was one of the victims mistakenly thinking copy=False means erroring out when a copy is needed.

leofang avatar Jul 21 '22 18:07 leofang

@shoyer could you post that opinion at NumPy, because last time around, I feel you were one of the people arguing strongly against a 3-way "boolean" switch here.

seberg avatar Jul 21 '22 20:07 seberg

Yes, I was advocating for enums last time 🤦

I commented on the mailing list discussion advocating for True/False/None.

shoyer avatar Jul 21 '22 21:07 shoyer

Thanks all. I think we're all good here, this issue can be closed?

rgommers avatar Aug 04 '22 11:08 rgommers

Yeah, the next step would be updating NumPy and getting the deprecation going.

seberg avatar Sep 05 '22 17:09 seberg