array-api Add `clip`

The clip function is quite handy for constraining data to a certain range. Would be nice to include this in the spec

Sep 23 '22 08:09 jakirkham

gh-187 says clip is universally implemented by libraries. This looks to me like a useful function to add. IIRC we had some issues with clip behavior in numpy, we should dig those up to check for possible semantics issues.

Sep 23 '22 09:09 rgommers

IIRC we had some issues with clip behavior in numpy, we should dig those up to check for possible semantics issues.

This wasn't too bad actually, a lot of it was segfaults and whether it was a ufunc or not. The relevant ones:

Semantics when the max value is smaller than the min value: https://github.com/numpy/numpy/issues/18782. Probably not very interesting, just undefined behavior,
Whether or not the output dtype should depend on the dtype of the min/max values: https://github.com/numpy/numpy/issues/16609. Not immediately clear to me what the right answer is here.

And of course we need to choose keyword names, always a hard problem. Existing signatures don't match:

np.clip(a, a_min, a_max, out=None, **kwargs)
torch.clip(input, min=None, max=None, *, out=None)

No name is ideal - min/max are nicer names, but not everyone may be happy that it conflicts with builtin names. a_min and a_max are bad though, given that we consistently name our input arrays x and not a. So how about:

def clip(x, min, max, /):  # all positional-only args to avoid incompatibilities

Oct 05 '22 21:10 rgommers

No name is ideal - min/max are nicer names, but not everyone may be happy that it conflicts with builtin names. a_min and a_max are bad though, given that we consistently name our input arrays x and not a. So how about:
def clip(x, min, max, /):  # all positional-only args to avoid incompatibilities

I would lean towardsclip(x, /, min=None, max=None). The naming conflict is only inside the array library, not in user code, which can call these variables whatever it likes.

Side note: this is yet another example (like unstack and moveaxis) of functionality that could perhaps have a universal default implementation terms of other array API functionality (specifically where).

Oct 05 '22 21:10 shoyer

The naming conflict is only inside the array library, not in user code, which can call these variables whatever it likes.

Only if they're positional, right?

I actually do like your signature better - clip(x, max=3) is nicer than having to write clip(x, None, 3). It's just another thing then that we have to deal with when we aim to converge NumPy's main namespace to the array API standard. But I guess it's not too bad, there's already **kwargs in np.clip, so using aliases is easy.

Oct 05 '22 22:10 rgommers

The naming conflict is only inside the array library, not in user code, which can call these variables whatever it likes.

Only if they're positional, right?

I only get lint errors for overriding builtins if I define variables with the same name as a builtin. Something like clip(x, max=3) is fine -- the lint error is defining a variable named max, not for using a keyword argument.

clip(x, max=max) would be problematic, but there's nothing in the interface that requires calling the variable max. Users can write something like clip(x, max=a_max) if desired.

Oct 05 '22 22:10 shoyer

cc @seberg (in case you have thoughts here)

Oct 05 '22 22:10 jakirkham

I agree that clip is clear enough that it seems fine to allow passing positional and min, max seems OK (no strong opinions on it though). I would lean against a_min, etc. even if that is what NumPy has. The alternative I may seriously consider is lower and upper (I have not quite made up my mind whether that is the better or worse name for bounds – mathematically speaking).

Oct 06 '22 07:10 seberg

Questions that were brought up yesterday:

Should the min and max inputs participate in determining the output dtype, or should the output dtype equal the input dtype?
- Tentative answer: == input_dtype
Are min and max allowed to be arrays, or only scalars?
- Tentative answer: yes, because otherwise there's no way to clip along an axis.

Oct 07 '22 07:10 rgommers

if min and max do not participate in the type promotion, this makes clip not equivalent to xp.maximum(xp.minimum(x, min), max) in functional behavior.

In NumPy both clip's min and max arguments participate in the type promotion:

In [3]: np.clip(np.arange(10, dtype=np.int8), 0, np.inf)
Out[3]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

Oct 25 '23 11:10 oleksandr-pavlyk

I don't want to volunteer it, but it shouldn't be hard to change that in NumPy or fix in a compat namespace. And yes, it will mean that it isn't the same as a chained minimum/maximum call.

Oct 25 '23 11:10 seberg

At this point, I'd find it a bit inconsistent if clip does not follow type promotion rules.

We explicitly changed, e.g., the bitwise_left|right_shift specifications to ensure consistent type promotion rules throughout.

At the time of original API addition, I had advocated for returning an array having a dtype of the same type, but objections were raised (in PyTorch) about having certain APIs exempt from normal type promotion rules.

Oct 25 '23 18:10 kgryte

array-api array-api copied to clipboard

Add `clip`

array-api
array-api copied to clipboard