matrixprofile-ts icon indicating copy to clipboard operation
matrixprofile-ts copied to clipboard

Stomp calculates wrong MP vectors for two time series comparison

Open MichaelDoron opened this issue 5 years ago • 7 comments

Reproduction:

from matrixprofile import *

a = np.random.rand(500) b = np.random.rand(500) mp_a_1 = matrixProfile.stomp(a,10,a)[0] mp_a_2 = matrixProfile.stomp(a,10)[0] mp_a_b = matrixProfile.stomp(a,10,b)[0]

assert np.max(np.abs(mp_a_b[0])) > 0, 'stomp returns 0-filled vectors when tsB != tsA' assert (mp_a_1[0] == mp_a_2[0]).all(), 'stomp returns different vectors when tsB = tsA and when tsB = None'

MichaelDoron avatar Apr 17 '19 13:04 MichaelDoron

Not sure what your first assertion is checking for (it seems to only check the first element in mp_a_b) - mp_a_b is not actually 0-filled (you can test it with mp_a_b.any())

But I'm seeing the same issue with comparing STOMP(A) != STOMP(A,A)

As far as I can tell, the issue is within STOMPDistanceProfile (I haven't tried to check this with STAMP) in the selfJoin check: if selfJoin: trivialMatchRange = (int(max(0,idx - np.round(m/2,0))),int(min(idx + np.round(m/2+1,0),n))) distanceProfile[trivialMatchRange[0]:trivialMatchRange[1]] = np.inf

Ofer-Idan avatar Apr 17 '19 18:04 Ofer-Idan

Put in a simple fix for STOMP(A) != STOMP(A,A). I'm looking into the bigger issue where STOMP(A,B) produces weird results.

Ofer-Idan avatar Apr 17 '19 18:04 Ofer-Idan

@Ofer-Idan any update on the above?

vanbenschoten avatar Jun 13 '19 14:06 vanbenschoten

Never compare floating point numbers for exact equality. Use numpy.allclose() instead. https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html https://floating-point-gui.de/errors/comparison/

JaKasb avatar Jun 15 '19 15:06 JaKasb

@vanbenschoten the PR for STOMP(A) != STOMP(A,A) was merged, so @MichaelDoron's original issue should be resolved (and this issue closed). @MichaelDoron can you please pull the latest and see if this can be closed please?

(I haven't had a chance to look into my other STOMP issues in a while as I moved to a different project, but that should be a separate issue altogether)

Thanks!

Ofer-Idan avatar Jun 15 '19 20:06 Ofer-Idan

Hey, thanks for looking into this. The second assertion is solved, mp_a_1 is now equal to mp_a_2. However, the first assertion (the one checking whether stomp returns 0-filled vectors when tsB != tsA) is still failing - mp_a_b seems to be filled with zeros.

MichaelDoron avatar Jun 15 '19 20:06 MichaelDoron

https://github.com/target/matrixprofile-ts/blob/bcba7dc741d254435a72a60d6e014bf563de7d5a/matrixprofile/utils.py#L154 I believe is the culprit here. Using the precomputed cache to populate the 0th index of the next row's distance profile makes the assumption that we're doing a self-join. In the event that we are comparing two time series that are not the same, this assumption becomes invalidated. What worked for me was to recompute the dot product between the query and the the 0th subsequence in the b timeseries.

aouyang1 avatar Jun 25 '19 08:06 aouyang1