tslearn icon indicating copy to clipboard operation
tslearn copied to clipboard

Faster kNN search with constrained DTW

Open rtavenar opened this issue 8 years ago • 5 comments

It would be nice to make kNN searches faster when Sakoe-Chiba constrained DTW is concerned using LB_Keogh based pre-filtering.

This should be implemented in the kneighbors method of class KNeighborsTimeSeriesMixin from module neighbors.

rtavenar avatar Aug 12 '17 13:08 rtavenar

In addition to the constraint and the LB_Keogh prune, I'd suggest adding in cascading pruning methods described by the UCR suite. It looks like it could be extended to KNN.

unnamedplay-r avatar Apr 07 '18 18:04 unnamedplay-r

@rtavenar (@GillesVandewiele: Would you mind share your thought on this?)

Do you mind if I work on this? I saw issue #407, and I had an idea. Then, I searched for LB_Keogh and found this issue as well!

So, I think something like this can work:

from tslearn.metrics import lb_keogh
from tslearn.metrics import dtw
from tslearn.metrics import cdist_dtw

# calculate LowerBound (LB)
LB = np.full((n, n), 0, dtype=np.float64)
for i in range(n - 1):
  for j in range(i + 1, n):
    d = lb_keogh(T[i], T[j], radius=r)
    LB[i, j] = d
    LB[j, i] = d # redundant! 


kmax = 20 # maximum number of neighbors 

# to keep track of nearest neighbors
D = np.full((n, kmax), np.inf, dtype=np.float64) 
I = np.full((n, kmax), -1, dtype=np.int64)

for i in range(n-1):
  for j in range(i+1, n):
    if LB[i,j] < D[i,-1]: 
      d = dtw(T[i], T[j], global_constraint="sakoe_chiba", sakoe_chiba_radius=r)   
      if d < D[i, -1]:
        pos = np.searchsorted(D[i], d, side='right')
        
        # update D and I
        D[i, pos+1:] = D[i, pos:-1]
        D[i, pos] = d

        I[i, pos+1:] = I[i, pos:-1]
        I[i, pos] = j

# fill distance matrix (it is exact at the indices stored in `I`)
Dist_matrix = np.full((n, n), np.inf, dtype=np.float64)
for i in range(n):
  for j, nn_i in enumerate(I[i]):
    Dist_matrix[i, nn_i] = D[i, j]

np.fill_diagonal(Dist_matrix, 0)

We can use Dist_matrix for KNN (with k <= kmax).

This is not bad. It shows some improvement. I think the paper mentioned by @unnamedplay-r might be useful as well.

NimaSarajpoor avatar Jun 28 '22 10:06 NimaSarajpoor

@NimaSarajpoor that would be great!

Probably, something that could be useful if you start working on this would be to have some kind of benchmark to highlight the improvement in running times when using pruning, as well as tests that would ensure that the results are similar to those obtained when DTW with Sakoe-Chiba band is sused without pruning.

rtavenar avatar Jun 28 '22 10:06 rtavenar

@rtavenar I got your point! Sure. I will keep you posted if I get to a good stage.

NimaSarajpoor avatar Jun 28 '22 10:06 NimaSarajpoor

We may take advantage of Upper Bound (UB, which is Euclidean distance) as follows:

  • Step1: calculate LB and UB vectors for observation i: LB[j] is lower bound of dtw distance between i and j
  • Step2: sort np.concatenate(LB, UB)
  • Step3: Remove neighbor j if LB[j] is more than UB of at least kmax subsequences.

I think step2, for all observations, has O(N^2 log N) time complexity. Maybe we should first see if it is worth it (and this may depend on the kmax)

NimaSarajpoor avatar Jun 28 '22 11:06 NimaSarajpoor