matrixprofile-ts
matrixprofile-ts copied to clipboard
Complexity Measure confuses mean and sum
The complexity measure for the annotation vector is defined as:
sqrt(sum(diff(subsequence).^2))
CID (Batista 2013) https://doi.org/10.1007/s10618-013-0312-3 Matrix Profile V (Dau 2017) http://dx.doi.org/10.1145/3097983.3097993
However, the implementation uses the mean instead of the sum. https://github.com/target/matrixprofile-ts/blob/207aa94cfff143dee824c46c5bad7444a806088c/matrixprofile/annotation_vector.py#L14
A possible fix is to use a moving sum. The moving sum is already contained in movmeanstd https://github.com/target/matrixprofile-ts/blob/207aa94cfff143dee824c46c5bad7444a806088c/matrixprofile/utils.py#L47
See MASS by Mueen: https://www.cs.unm.edu/~mueen/findNN.html
Fun Fact: mean(x) = sum(x) / len(x) In the sliding window setting, len(x) is a constant integer. The min-max normalization of the Annotation Vector removes the constant factor.
Debugging Implementation (probably contains off-by-one error):
def CE(x: np.ndarray) -> float:
assert x.ndim == 1, x.shape
_ce = np.sqrt(np.sum(np.ediff1d(x)**2))
assert _ce >= 0, (_ce, x)
return _ce
@peterdhansen
Vectorized Numpy Implementation:
from skimage.util import view_as_windows
sliding_windows = view_as_windows(data, subsequenceLength)
assert sliding_windows.shape[0] == len(data)-subsequenceLength+1
complexity = np.sqrt(np.sum(np.square(np.diff(sliding_windows, axis=1)), axis=1))
assert sliding_windows.shape[0] == complexity.size