dtw-python Warped series is not of the same length as the reference and query

trafficstars

Python port of R's Comprehensive Dynamic Time Warp algorithm package version:
Python version: 3.7.3
Operating System: OS X

Description

If the query and reference series are of size N, then the warped series is always of size N-1. This can be explained by how the warp function is set up:

jset = alignment.index2 jmax = numpy.max(jset) ii = function(numpy.arange(jmax)) return ii

As Python starts counting from 0, jmax is always N-1. I was wondering whether this is the desired output or whether the length of the warped series should also be N.

What I Did

alignment = dtw(
+        y_pred,
+        y_true,
+        keep_internals=True,
+        step_pattern=step_pattern,
+        window_type="sakoechiba",
+        window_args={"window_size": 4},
+    )
+    warped_indices = warp(alignment, index_reference=False)
+    aligned_series = y_pred[warped_indices]

Nov 10 '20 16:11 JulietteVTrigt

It is intended. jmax is the index to the element containing the last aligned point. It may differ from N-1 for open_end=True alignments.

Nov 10 '20 17:11 tonigi

I realise this issue has been closed, however I've been experiencing the same problem. There is an off-by-one difference, where the warped series is always one sample short of the length of the series it is being warped to.

I can see that jmax should be the index of the last element, but I was wondering if the issue is more in the generation of the indices to be fed to the interpolation: ii = ifun(numpy.arange(jmax)) . arange will generate values up to but not including the given value, so the length will be one short of jmax. For example, if jmax = 9 (corresponding to the 10th position), then arange(jmax) will only return 9 numbers (0 through 8 inclusive), and so will miss the final index of jmax. Might it make more sense to do arange(jmax + 1) instead, which would give the full length sequence ending on the value of jmax?

I've tried making that change in my own local copy, and it doesn't seem to have broken anything. If nothing else, maybe an extra argument could be added to the function giving the user the option to choose?

Feb 10 '21 18:02 davidmwatson

I think you rather want either the .index1 or .index2 properties in the result. jmax is just an index. They will be of the same length and be correct for any step pattern chosen.

Feb 11 '21 11:02 tonigi

Okay, here's an example adapted from the docs to show what I mean. First, modify the warp function in warp.py to allow incrementing jmax if needed.

def warp(d, index_reference=False, full_length=False):
    if not index_reference:
        iset = d.index1
        jset = d.index2
    else:
        iset = d.index2
        jset = d.index1

    jmax = numpy.max(jset)

    # Scipy interp1d is buggy. it does not deal with leading
    # duplicated values of x. It returns different values
    # depending on the dtypes of arguments.
    ifun = _interp(x=jset, y=iset)

    ## Change here ##
    n = jmax + 1 if full_length else jmax 
    ii = ifun(numpy.arange(n))
    ## End change ##

    # Quick fix for bug
    if numpy.isnan(ii[0]):
        ii[numpy.isnan(ii)] = iset[0]
    return ii.astype(int)

Then run the example from the docs

idx = np.linspace(0, 2 * np.pi, num=100)
query = np.sin(idx) + np.random.uniform(size=100)/10.0
reference = np.cos(idx)

print(len(reference))  # -> 100

alignment = dtw(query, reference)

# Existing behaviour
wq1 = warp(alignment)
warped_query1 = query[wq1]
print(len(wq1), len(warped_query1))  # -> 99

# Modified behaviour
wq2 = warp(alignment, full_length=True)
warped_query2 = query[wq2]
print(len(wq2), len(warped_query2))  # -> 100

The existing warp behaviour makes wq (and hence also warped_query) one short of the length of the reference. I was expecting that if I warp something to a given reference, the warped series should be the same length as that reference, not one short of it. Using jmax+1 produces the behaviour I was expecting, at least. The .index1/2 attributes are both the same length as each other, but not the same length as the reference, so wouldn't really do what I want either.

Using the R implementation, the same example produces the behaviour I was expecting (the warped query is the same length as the reference):

idx <- seq(0, 2 * pi, length.out=100)
query <- sin(idx) + runif(100)/10.0
reference <- cos(idx)

print(length(reference))  # -> 100

alignment <- dtw(query, reference)

wq <- warp(alignment)
warped_query <- query[wq]
print(c(length(wq), length(warped_query)))  # -> 100

So the python and R implementations are slightly inconsistent here. The difference arises because python and R use different indexing (numpy's arange goes up to but not including jmax, while R's colon operator goes up to and including jmax). If I understand correctly, using arange(jmax+1) in python should reproduce the behaviour of the R implementation.

Feb 11 '21 18:02 davidmwatson

Thanks for the added details. I'll have a look. The R warp function should be correct.

Feb 11 '21 21:02 tonigi

Sorry for the delay. I still haven't had the time to carefully check the fix and think about the corner cases, but in the meantime, it sounds good for people stumbling upon the bug.

Jun 30 '21 09:06 tonigi

I have also recently come across this issue and decided on the same fix for the code. Is there any confirmation that this change will then provide the same output as the R version?

Sep 16 '22 00:09 steelec

Python version's warp function relies on scipy interpolation, which is different from R's. This should be fixed in the future. So, even with the fix there still can be discrepancies.

Sep 16 '22 06:09 tonigi

I incorporated the change in 9ad9bf953976eaaf0c16ae76a099cec6a2ecc165 and released 1.4.0 . Sorry for taking it so long.

Mar 18 '24 16:03 tonigi

dtw-python dtw-python copied to clipboard

Warped series is not of the same length as the reference and query

Description

What I Did

dtw-python
dtw-python copied to clipboard