array-api icon indicating copy to clipboard operation
array-api copied to clipboard

Add `clip`

Open jakirkham opened this issue 2 years ago • 1 comments

The clip function is quite handy for constraining data to a certain range. Would be nice to include this in the spec

jakirkham avatar Sep 23 '22 08:09 jakirkham

gh-187 says clip is universally implemented by libraries. This looks to me like a useful function to add. IIRC we had some issues with clip behavior in numpy, we should dig those up to check for possible semantics issues.

rgommers avatar Sep 23 '22 09:09 rgommers

IIRC we had some issues with clip behavior in numpy, we should dig those up to check for possible semantics issues.

This wasn't too bad actually, a lot of it was segfaults and whether it was a ufunc or not. The relevant ones:

  • Semantics when the max value is smaller than the min value: https://github.com/numpy/numpy/issues/18782. Probably not very interesting, just undefined behavior,
  • Whether or not the output dtype should depend on the dtype of the min/max values: https://github.com/numpy/numpy/issues/16609. Not immediately clear to me what the right answer is here.

And of course we need to choose keyword names, always a hard problem. Existing signatures don't match:

  • np.clip(a, a_min, a_max, out=None, **kwargs)
  • torch.clip(input, min=None, max=None, *, out=None)

No name is ideal - min/max are nicer names, but not everyone may be happy that it conflicts with builtin names. a_min and a_max are bad though, given that we consistently name our input arrays x and not a. So how about:

def clip(x, min, max, /):  # all positional-only args to avoid incompatibilities

rgommers avatar Oct 05 '22 21:10 rgommers

No name is ideal - min/max are nicer names, but not everyone may be happy that it conflicts with builtin names. a_min and a_max are bad though, given that we consistently name our input arrays x and not a. So how about:

def clip(x, min, max, /):  # all positional-only args to avoid incompatibilities

I would lean towardsclip(x, /, min=None, max=None). The naming conflict is only inside the array library, not in user code, which can call these variables whatever it likes.

Side note: this is yet another example (like unstack and moveaxis) of functionality that could perhaps have a universal default implementation terms of other array API functionality (specifically where).

shoyer avatar Oct 05 '22 21:10 shoyer

The naming conflict is only inside the array library, not in user code, which can call these variables whatever it likes.

Only if they're positional, right?

I actually do like your signature better - clip(x, max=3) is nicer than having to write clip(x, None, 3). It's just another thing then that we have to deal with when we aim to converge NumPy's main namespace to the array API standard. But I guess it's not too bad, there's already **kwargs in np.clip, so using aliases is easy.

rgommers avatar Oct 05 '22 22:10 rgommers

The naming conflict is only inside the array library, not in user code, which can call these variables whatever it likes.

Only if they're positional, right?

I only get lint errors for overriding builtins if I define variables with the same name as a builtin. Something like clip(x, max=3) is fine -- the lint error is defining a variable named max, not for using a keyword argument.

clip(x, max=max) would be problematic, but there's nothing in the interface that requires calling the variable max. Users can write something like clip(x, max=a_max) if desired.

shoyer avatar Oct 05 '22 22:10 shoyer

cc @seberg (in case you have thoughts here)

jakirkham avatar Oct 05 '22 22:10 jakirkham

I agree that clip is clear enough that it seems fine to allow passing positional and min, max seems OK (no strong opinions on it though). I would lean against a_min, etc. even if that is what NumPy has. The alternative I may seriously consider is lower and upper (I have not quite made up my mind whether that is the better or worse name for bounds – mathematically speaking).

seberg avatar Oct 06 '22 07:10 seberg

Questions that were brought up yesterday:

  • Should the min and max inputs participate in determining the output dtype, or should the output dtype equal the input dtype?
    • Tentative answer: == input_dtype
  • Are min and max allowed to be arrays, or only scalars?
    • Tentative answer: yes, because otherwise there's no way to clip along an axis.

rgommers avatar Oct 07 '22 07:10 rgommers

if min and max do not participate in the type promotion, this makes clip not equivalent to xp.maximum(xp.minimum(x, min), max) in functional behavior.

In NumPy both clip's min and max arguments participate in the type promotion:

In [3]: np.clip(np.arange(10, dtype=np.int8), 0, np.inf)
Out[3]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

oleksandr-pavlyk avatar Oct 25 '23 11:10 oleksandr-pavlyk

I don't want to volunteer it, but it shouldn't be hard to change that in NumPy or fix in a compat namespace. And yes, it will mean that it isn't the same as a chained minimum/maximum call.

seberg avatar Oct 25 '23 11:10 seberg

At this point, I'd find it a bit inconsistent if clip does not follow type promotion rules.

We explicitly changed, e.g., the bitwise_left|right_shift specifications to ensure consistent type promotion rules throughout.

At the time of original API addition, I had advocated for returning an array having a dtype of the same type, but objections were raised (in PyTorch) about having certain APIs exempt from normal type promotion rules.

kgryte avatar Oct 25 '23 18:10 kgryte