dtw-python
dtw-python copied to clipboard
Warped series is not of the same length as the reference and query
- Python port of R's Comprehensive Dynamic Time Warp algorithm package version:
- Python version: 3.7.3
- Operating System: OS X
Description
If the query and reference series are of size N, then the warped series is always of size N-1. This can be explained by how the warp function is set up:
jset = alignment.index2 jmax = numpy.max(jset) ii = function(numpy.arange(jmax)) return ii
As Python starts counting from 0, jmax is always N-1. I was wondering whether this is the desired output or whether the length of the warped series should also be N.
What I Did
alignment = dtw(
+ y_pred,
+ y_true,
+ keep_internals=True,
+ step_pattern=step_pattern,
+ window_type="sakoechiba",
+ window_args={"window_size": 4},
+ )
+ warped_indices = warp(alignment, index_reference=False)
+ aligned_series = y_pred[warped_indices]
It is intended. jmax is the index to the element containing the last aligned point. It may differ from N-1 for open_end=True alignments.
I realise this issue has been closed, however I've been experiencing the same problem. There is an off-by-one difference, where the warped series is always one sample short of the length of the series it is being warped to.
I can see that jmax should be the index of the last element, but I was wondering if the issue is more in the generation of the indices to be fed to the interpolation: ii = ifun(numpy.arange(jmax)) . arange will generate values up to but not including the given value, so the length will be one short of jmax. For example, if jmax = 9 (corresponding to the 10th position), then arange(jmax) will only return 9 numbers (0 through 8 inclusive), and so will miss the final index of jmax. Might it make more sense to do arange(jmax + 1) instead, which would give the full length sequence ending on the value of jmax?
I've tried making that change in my own local copy, and it doesn't seem to have broken anything. If nothing else, maybe an extra argument could be added to the function giving the user the option to choose?
I think you rather want either the .index1 or .index2 properties in the result. jmax is just an index.
They will be of the same length and be correct for any step pattern chosen.
Okay, here's an example adapted from the docs to show what I mean. First, modify the warp function in warp.py to allow incrementing jmax if needed.
def warp(d, index_reference=False, full_length=False):
if not index_reference:
iset = d.index1
jset = d.index2
else:
iset = d.index2
jset = d.index1
jmax = numpy.max(jset)
# Scipy interp1d is buggy. it does not deal with leading
# duplicated values of x. It returns different values
# depending on the dtypes of arguments.
ifun = _interp(x=jset, y=iset)
## Change here ##
n = jmax + 1 if full_length else jmax
ii = ifun(numpy.arange(n))
## End change ##
# Quick fix for bug
if numpy.isnan(ii[0]):
ii[numpy.isnan(ii)] = iset[0]
return ii.astype(int)
Then run the example from the docs
idx = np.linspace(0, 2 * np.pi, num=100)
query = np.sin(idx) + np.random.uniform(size=100)/10.0
reference = np.cos(idx)
print(len(reference)) # -> 100
alignment = dtw(query, reference)
# Existing behaviour
wq1 = warp(alignment)
warped_query1 = query[wq1]
print(len(wq1), len(warped_query1)) # -> 99
# Modified behaviour
wq2 = warp(alignment, full_length=True)
warped_query2 = query[wq2]
print(len(wq2), len(warped_query2)) # -> 100
The existing warp behaviour makes wq (and hence also warped_query) one short of the length of the reference. I was expecting that if I warp something to a given reference, the warped series should be the same length as that reference, not one short of it. Using jmax+1 produces the behaviour I was expecting, at least. The .index1/2 attributes are both the same length as each other, but not the same length as the reference, so wouldn't really do what I want either.
Using the R implementation, the same example produces the behaviour I was expecting (the warped query is the same length as the reference):
idx <- seq(0, 2 * pi, length.out=100)
query <- sin(idx) + runif(100)/10.0
reference <- cos(idx)
print(length(reference)) # -> 100
alignment <- dtw(query, reference)
wq <- warp(alignment)
warped_query <- query[wq]
print(c(length(wq), length(warped_query))) # -> 100
So the python and R implementations are slightly inconsistent here. The difference arises because python and R use different indexing (numpy's arange goes up to but not including jmax, while R's colon operator goes up to and including jmax). If I understand correctly, using arange(jmax+1) in python should reproduce the behaviour of the R implementation.
Thanks for the added details. I'll have a look. The R warp function should be correct.
Sorry for the delay. I still haven't had the time to carefully check the fix and think about the corner cases, but in the meantime, it sounds good for people stumbling upon the bug.
I have also recently come across this issue and decided on the same fix for the code. Is there any confirmation that this change will then provide the same output as the R version?
Python version's warp function relies on scipy interpolation, which is different from R's. This should be fixed in the future. So, even with the fix there still can be discrepancies.
I incorporated the change in 9ad9bf953976eaaf0c16ae76a099cec6a2ecc165 and released 1.4.0 . Sorry for taking it so long.