matrixprofile-ts icon indicating copy to clipboard operation
matrixprofile-ts copied to clipboard

Complexity Measure confuses mean and sum

Open JaKasb opened this issue 4 years ago • 1 comments

The complexity measure for the annotation vector is defined as: sqrt(sum(diff(subsequence).^2))

CID (Batista 2013) https://doi.org/10.1007/s10618-013-0312-3 Matrix Profile V (Dau 2017) http://dx.doi.org/10.1145/3097983.3097993

However, the implementation uses the mean instead of the sum. https://github.com/target/matrixprofile-ts/blob/207aa94cfff143dee824c46c5bad7444a806088c/matrixprofile/annotation_vector.py#L14

A possible fix is to use a moving sum. The moving sum is already contained in movmeanstd https://github.com/target/matrixprofile-ts/blob/207aa94cfff143dee824c46c5bad7444a806088c/matrixprofile/utils.py#L47

See MASS by Mueen: https://www.cs.unm.edu/~mueen/findNN.html

Fun Fact: mean(x) = sum(x) / len(x) In the sliding window setting, len(x) is a constant integer. The min-max normalization of the Annotation Vector removes the constant factor.

Debugging Implementation (probably contains off-by-one error):

def CE(x: np.ndarray) -> float:
    assert x.ndim == 1, x.shape
    _ce = np.sqrt(np.sum(np.ediff1d(x)**2))
    assert _ce >= 0, (_ce, x)
    return _ce

@peterdhansen

JaKasb avatar Nov 08 '19 17:11 JaKasb

Vectorized Numpy Implementation:

from skimage.util import view_as_windows

sliding_windows = view_as_windows(data, subsequenceLength)
assert sliding_windows.shape[0] == len(data)-subsequenceLength+1
complexity = np.sqrt(np.sum(np.square(np.diff(sliding_windows, axis=1)), axis=1))
assert sliding_windows.shape[0] == complexity.size

JaKasb avatar Nov 11 '19 18:11 JaKasb