scikit-learn-intelex icon indicating copy to clipboard operation
scikit-learn-intelex copied to clipboard

benchmark miss: dynamic time warping (dtw)

Open m-r-munroe opened this issue 3 years ago • 3 comments

Describe the issue It is a miss in your benchmarking. I think you should look at the "time-series-bakeoff" for others like it. DTW is a pain, and very well known and well used method.

The speedup you can get there is going to speak to signal analysts in several areas.

References for DTW:

  • http://www.cs.ucr.edu/~eamonn/DTW_myths.pdf <-- look at how DTW is described by Anthony Bagnall here
  • https://arxiv.org/abs/1602.01711

Citation:
Berndt, D. J., & Clifford, J. (1994, July). Using dynamic time warping to find patterns in time series. In KDD workshop (Vol. 10, No. 16, pp. 359-370).

m-r-munroe avatar Jul 22 '21 15:07 m-r-munroe

@m-r-munroe, it's interesting. What implementation of DTW do you (others) use? Some library?

SmirnovEgorRu avatar Jul 22 '21 18:07 SmirnovEgorRu

pyts is decent.
https://github.com/johannfaouzi/pyts

sktime is also solid. https://github.com/alan-turing-institute/sktime

m-r-munroe avatar Jul 22 '21 18:07 m-r-munroe

Along with random forests you should have done a grid-search using gradient boosted machines. Those things take 10x more trees in the ensemble, and they are interdependent so they take a lot longer to run. They used to be the winningest algorithm in kaggle.

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html https://www.kaggle.com/msjgriffiths/r-what-algorithms-are-most-successful-on-kaggle/notebook https://bradleyboehmke.github.io/HOML/gbm.html

m-r-munroe avatar Jul 30 '21 16:07 m-r-munroe