array-api
array-api copied to clipboard
Add `clip`
The clip
function is quite handy for constraining data to a certain range. Would be nice to include this in the spec
gh-187 says clip
is universally implemented by libraries. This looks to me like a useful function to add. IIRC we had some issues with clip
behavior in numpy
, we should dig those up to check for possible semantics issues.
IIRC we had some issues with
clip
behavior innumpy
, we should dig those up to check for possible semantics issues.
This wasn't too bad actually, a lot of it was segfaults and whether it was a ufunc or not. The relevant ones:
- Semantics when the max value is smaller than the min value: https://github.com/numpy/numpy/issues/18782. Probably not very interesting, just undefined behavior,
- Whether or not the output dtype should depend on the dtype of the min/max values: https://github.com/numpy/numpy/issues/16609. Not immediately clear to me what the right answer is here.
And of course we need to choose keyword names, always a hard problem. Existing signatures don't match:
-
np.clip(a, a_min, a_max, out=None, **kwargs)
-
torch.clip(input, min=None, max=None, *, out=None)
No name is ideal - min
/max
are nicer names, but not everyone may be happy that it conflicts with builtin names. a_min
and a_max
are bad though, given that we consistently name our input arrays x
and not a
. So how about:
def clip(x, min, max, /): # all positional-only args to avoid incompatibilities
No name is ideal -
min
/max
are nicer names, but not everyone may be happy that it conflicts with builtin names.a_min
anda_max
are bad though, given that we consistently name our input arraysx
and nota
. So how about:def clip(x, min, max, /): # all positional-only args to avoid incompatibilities
I would lean towardsclip(x, /, min=None, max=None)
. The naming conflict is only inside the array library, not in user code, which can call these variables whatever it likes.
Side note: this is yet another example (like unstack
and moveaxis
) of functionality that could perhaps have a universal default implementation terms of other array API functionality (specifically where
).
The naming conflict is only inside the array library, not in user code, which can call these variables whatever it likes.
Only if they're positional, right?
I actually do like your signature better - clip(x, max=3)
is nicer than having to write clip(x, None, 3)
. It's just another thing then that we have to deal with when we aim to converge NumPy's main namespace to the array API standard. But I guess it's not too bad, there's already **kwargs
in np.clip
, so using aliases is easy.
The naming conflict is only inside the array library, not in user code, which can call these variables whatever it likes.
Only if they're positional, right?
I only get lint errors for overriding builtins if I define variables with the same name as a builtin. Something like clip(x, max=3)
is fine -- the lint error is defining a variable named max
, not for using a keyword argument.
clip(x, max=max)
would be problematic, but there's nothing in the interface that requires calling the variable max
. Users can write something like clip(x, max=a_max)
if desired.
cc @seberg (in case you have thoughts here)
I agree that clip is clear enough that it seems fine to allow passing positional and min
, max
seems OK (no strong opinions on it though).
I would lean against a_min
, etc. even if that is what NumPy has. The alternative I may seriously consider is lower
and upper
(I have not quite made up my mind whether that is the better or worse name for bounds – mathematically speaking).
Questions that were brought up yesterday:
- Should the
min
andmax
inputs participate in determining the output dtype, or should the output dtype equal the input dtype?- Tentative answer:
== input_dtype
- Tentative answer:
- Are
min
andmax
allowed to be arrays, or only scalars?- Tentative answer: yes, because otherwise there's no way to clip along an axis.
if min
and max
do not participate in the type promotion, this makes clip
not equivalent to xp.maximum(xp.minimum(x, min), max)
in functional behavior.
In NumPy both clip
's min
and max
arguments participate in the type promotion:
In [3]: np.clip(np.arange(10, dtype=np.int8), 0, np.inf)
Out[3]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
I don't want to volunteer it, but it shouldn't be hard to change that in NumPy or fix in a compat namespace. And yes, it will mean that it isn't the same as a chained minimum/maximum call.
At this point, I'd find it a bit inconsistent if clip
does not follow type promotion rules.
We explicitly changed, e.g., the bitwise_left|right_shift
specifications to ensure consistent type promotion rules throughout.
At the time of original API addition, I had advocated for returning an array having a dtype of the same type, but objections were raised (in PyTorch) about having certain APIs exempt from normal type promotion rules.