aeon icon indicating copy to clipboard operation
aeon copied to clipboard

[BUG] lcss distance producing negative values in pairwise dist matrix

Open hadifawaz1999 opened this issue 1 year ago • 8 comments

Describe the bug

When running tests on distance module locally, noticed a random fail (not always) on the lcss distance in the test_sklearn_compatibility:test_clusterer. Apparently, after i printed the pairwise dist matrix, its producing a very small negative number on some random data, the value is in order of 10 to the power -17 but its still negative.

Steps/Code to reproduce the bug

Run the mentioned test many times to see it.

Expected results

it should be all positive

Actual results

positive values only

Versions

No response

hadifawaz1999 avatar Sep 24 '23 05:09 hadifawaz1999

I can reproduce this issue in the current version (and it is annoying because it breaks tests occasionally). I suggest adding a fixed random_state=1 and random_state=2 to the example data creation:

https://github.com/aeon-toolkit/aeon/blob/93726dc04caa2a83a61553100ac62c9cc8db9104/aeon/distances/tests/test_sklearn_compatibility.py#L32-L37

Here is a test to isolate and reproduce the issue:

# you might want to add more seeds, but seed=1 already produces the bug for dist["name"] == "lcss"
@pytest.mark.parametrize("seed", [1, 10, 42, 52, 100])
@pytest.mark.parametrize("dist", DISTANCES)
def test_distance_non_negative(dist, seed):
    """Most estimators require distances to be non-negative."""
    X = make_example_3d_numpy(
        n_cases=5, n_channels=1, n_timepoints=10, regression_target=True, random_state=seed, return_y=False
    )
    X2 = make_example_3d_numpy(
        n_cases=10, n_channels=1, n_timepoints=10, regression_target=True, random_state=seed + 1, return_y=False
    )
    pairwise = dist["pairwise_distance"]
    Xt2 = pairwise(X2, X)
    assert Xt2.min() >= 0, f"Distance {dist['name']} is negative"

The other distances are not affected.

SebastianSchmidl avatar Mar 11 '24 13:03 SebastianSchmidl

Thanks for the insight @CodeLionX ! It seems like a valid solution, would very much like the input of @chrisholder on this as well

hadifawaz1999 avatar Mar 11 '24 13:03 hadifawaz1999

hi, thanks @CodeLionX I'm fine with that solution, but would also like to know why it does it, bet its a numba thing :)

TonyBagnall avatar Mar 11 '24 13:03 TonyBagnall

random_state=1 and random_state=2 are not a solution. With those seeds, the tests reliably fail. For me, random_state=42 and random_state=43 worked successfully, tho.

My guess is Numba's fastmath=True that we use in all distance implementations.

SebastianSchmidl avatar Mar 11 '24 13:03 SebastianSchmidl

@CodeLionX is the difference in the values compared to 0 significant or as mentioned in this issue's comment above ? in the case of seed 1 and 2

hadifawaz1999 avatar Mar 11 '24 14:03 hadifawaz1999

@hadifawaz1999 The negative value is actually always the same (-5.551115123125783e-17), independent of the seed (as long as it is negative for this seed). I could also find out that it just happens for small n_timepoints and not all shapes: Just for 5, 10, 11, 13, and 22 out of list(range(3, 50)) + [100, 1000] (with some of the following seeds failing: [1, 10, 42, 52, 100, 110, 120, 150, 1018964])

SebastianSchmidl avatar Mar 11 '24 14:03 SebastianSchmidl

Might be due to this line in the _lcss_distance and maybe numba issues

return 1 - (float(distance / min(x.shape[1], y.shape[1])))

Would it make sense to try clipping it to zero and try again? Something like this:

result = 1 - (float(distance / min(x.shape[1], y.shape[1])))
return max(result, 0.0)

aadya940 avatar Mar 15 '24 09:03 aadya940

@aadya940 I think your solution is the simplest and most practical. This is assuming there is no logic in the operation of the distance function, but tbh LCSS is not very effective. @chrisholder any thoughts?

TonyBagnall avatar May 11 '24 16:05 TonyBagnall

I cant actually reproduce this with the above test, but will put in the fix nevertheless, cant hurt

TonyBagnall avatar Jun 05 '24 21:06 TonyBagnall