circlecore
circlecore copied to clipboard
fix: return entire array if no outliers are found
fix in the filter_outliers
method, with previous behavior if no outliers were found, an empty array would have been returned. this results in adjusted ur calculation being None
in some replays (while normal ur calculation would be fine)
this conflicts with the usage and documentation of the method, which states that it'd return the array with the outliers removed.
to fix this, we simply return the original array if no outliers were found.
am I misreading or doesn't this return the entire array if everything is an outlier? if nothing if an outlier, then this function still returns the entire array.
I think it's actually somewhat valid to return an invalid adjusted ur when everything is an outlier (because it's very unreliable if everything is an outlier), but open to opinions.
if everything is an outlier?
i don't see how that's possible with how interquartile range was implemented, could you provide an example?
I mean, I'm just reading the code as written:
arr_without_outliers = [x for x in arr if lower_limit < x < upper_limit]
return arr if not arr_without_outliers else arr_without_outliers
this returns the original array arr
if arr_without_outliers
is empty (i.e. the array was empty to begin with, or everything in the array is an outlier)
put another way: this function in master already does what this pr says, which is return the entire array when there are no outliers. so something is being lost in translation here
skipping semantics, as it is in master an .ur(adjusted=True)
calculation will return None if there are no outliers
if that's an intended outcome then you can close the pr, per personal preference i need it to return the same value as adjusted=False
instead of None
and merged downstream incase it wasnt intended (wasnt clear to me)
skipping semantics, as it is in master an .ur(adjusted=True) calculation will return None if there are no outliers
do you have an example? I can't reproduce this. If this is the case, that's definitely a bug, like you mentioned.
do you have an example? I can't reproduce this. If this is the case, that's definitely a bug, like you mentioned.
it usually happens at very low cvUR, one example would be this replay
it has lower limit of 2 and upper limit of 2 so [x for x in arr if lower_limit < x < upper_limit]
would return an empty array (and thus UR calc would return None)
another option could be including the boundary with lower_limit <= x <= upper_limit
That's a case where everything is an outlier, not nothing.
fwiw, I get nan
and an error for normal and adjusted ur respectively for that replay:
r = cg.ReplayMap(1711983, 33839341)
print(cg.ur(r)) # nan
print(cg.ur(r, adjusted=True)) # error
...
File "/opt/homebrew/lib/python3.12/site-packages/numpy/lib/function_base.py", line 4830, in _quantile
slices_having_nans = np.isnan(arr[-1, ...])
~~~^^^^^^^^^
IndexError: index -1 is out of bounds for axis 0 with size 0
I'm fine with including the endpoints for outlier filtering to avoid this in the case of 0 iqr. We also probably want to raise an error when calculating the ur of a replay with < 3 hits (I thought we already did, to be honest).
fwiw, I get
nan
and an error for normal and adjusted ur respectively for that replay:
Huh, weird. I ran your script and could not reproduce the results on circleguard 5.4.1
, I get an UR calculation of 4.705853985027059
and the adjusted UR d oes return nan with the following error
venv/lib/python3.11/site-packages/numpy/core/_methods.py:206: RuntimeWarning: Degrees of freedom <= 0 for slice
ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
venv/lib/python3.11/site-packages/numpy/core/_methods.py:163: RuntimeWarning: invalid value encountered in divide
arrmean = um.true_divide(arrmean, div, out=arrmean,
venv/lib/python3.11/site-packages/numpy/core/_methods.py:198: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
nan