aeon
aeon copied to clipboard
[BUG] lcss distance producing negative values in pairwise dist matrix
Describe the bug
When running tests on distance module locally, noticed a random fail (not always) on the lcss distance in the test_sklearn_compatibility:test_clusterer. Apparently, after i printed the pairwise dist matrix, its producing a very small negative number on some random data, the value is in order of 10 to the power -17 but its still negative.
Steps/Code to reproduce the bug
Run the mentioned test many times to see it.
Expected results
it should be all positive
Actual results
positive values only
Versions
No response
I can reproduce this issue in the current version (and it is annoying because it breaks tests occasionally). I suggest adding a fixed random_state=1
and random_state=2
to the example data creation:
https://github.com/aeon-toolkit/aeon/blob/93726dc04caa2a83a61553100ac62c9cc8db9104/aeon/distances/tests/test_sklearn_compatibility.py#L32-L37
Here is a test to isolate and reproduce the issue:
# you might want to add more seeds, but seed=1 already produces the bug for dist["name"] == "lcss"
@pytest.mark.parametrize("seed", [1, 10, 42, 52, 100])
@pytest.mark.parametrize("dist", DISTANCES)
def test_distance_non_negative(dist, seed):
"""Most estimators require distances to be non-negative."""
X = make_example_3d_numpy(
n_cases=5, n_channels=1, n_timepoints=10, regression_target=True, random_state=seed, return_y=False
)
X2 = make_example_3d_numpy(
n_cases=10, n_channels=1, n_timepoints=10, regression_target=True, random_state=seed + 1, return_y=False
)
pairwise = dist["pairwise_distance"]
Xt2 = pairwise(X2, X)
assert Xt2.min() >= 0, f"Distance {dist['name']} is negative"
The other distances are not affected.
Thanks for the insight @CodeLionX ! It seems like a valid solution, would very much like the input of @chrisholder on this as well
hi, thanks @CodeLionX I'm fine with that solution, but would also like to know why it does it, bet its a numba thing :)
random_state=1
and random_state=2
are not a solution. With those seeds, the tests reliably fail. For me, random_state=42
and random_state=43
worked successfully, tho.
My guess is Numba's fastmath=True
that we use in all distance implementations.
@CodeLionX is the difference in the values compared to 0 significant or as mentioned in this issue's comment above ? in the case of seed 1 and 2
@hadifawaz1999
The negative value is actually always the same (-5.551115123125783e-17
), independent of the seed (as long as it is negative for this seed). I could also find out that it just happens for small n_timepoints
and not all shapes:
Just for 5, 10, 11, 13, and 22 out of list(range(3, 50)) + [100, 1000]
(with some of the following seeds failing: [1, 10, 42, 52, 100, 110, 120, 150, 1018964]
)
Might be due to this line in the _lcss_distance
and maybe numba
issues
return 1 - (float(distance / min(x.shape[1], y.shape[1])))
Would it make sense to try clipping it to zero and try again? Something like this:
result = 1 - (float(distance / min(x.shape[1], y.shape[1])))
return max(result, 0.0)
@aadya940 I think your solution is the simplest and most practical. This is assuming there is no logic in the operation of the distance function, but tbh LCSS is not very effective. @chrisholder any thoughts?
I cant actually reproduce this with the above test, but will put in the fix nevertheless, cant hurt