DILATE Pure PyTorch implementation

I love the idea behind DILATE and would like to include it in pytorch-forecasting. However, a GPU-only implementation is probably needed for wider adoption. Do you plan on a CUDA or performant pure PyTorch implementation?

Nov 03 '20 12:11 jdb78

Hi Jan @jdb78 ! Thank for your interest in DILATE, I have run a run-time comparison with a pure GPU implementation of the soft-DTW part. However, unless my implementation is not optimal, I found it slower than the CPU version (which is also present in the softDTW of M. Cuturi et al https://github.com/mblondel/soft-dtw). I guess this is due to the double loops of the dynamic programming that is quicker in CPU.

Dec 17 '20 16:12 vincent-leguen

I wonder what happens to the benchmark once you have to not only calculate the loss but backpropagate through the entire network. If this is the main computational burden, I might not care so much about the performance of the loss function itself (sure, you could move data to CPU, calculate and move back to GPU, but guess the copying comes with a significant cost). Could you point me to the GPU implementation?

Dec 17 '20 17:12 jdb78

Yes of course, I have sent you an email with the GPU implementation.

Dec 17 '20 20:12 vincent-leguen

Thanks a lot! Will try to integrate it.

Dec 17 '20 21:12 jdb78

Hi @jdb78, I was wondering have you tried integrating it with pytorch-forecasting? Pytorch-forecasting made my life so much easier, plus the idea behind DILATE is really interesting, would love to try out a GPU implementation with Pytorch-forecasting. Could you please point me towards it?

May 06 '22 08:05 kamal-nain

Hello, could you please share me with the GPU implementation? Thank you in advance!

Sep 10 '22 10:09 wang-zm18

Hi Jan @jdb78 ! Thank for your interest in DILATE, I have run a run-time comparison with a pure GPU implementation of the soft-DTW part. However, unless my implementation is not optimal, I found it slower than the CPU version (which is also present in the softDTW of M. Cuturi et al https://github.com/mblondel/soft-dtw). I guess this is due to the double loops of the dynamic programming that is quicker in CPU.

Hi, Thanks so much for sharing, can you send me a GPU implementations?

Sep 15 '22 17:09 ly1112

Yes of course, I have sent you an email with the GPU implementation.

Hi Leguen, could you please send me the gpu version? [email protected]

Nov 17 '22 14:11 dongxinyu1030

Hello, could you please send me the gpu version? [email protected]

Oct 17 '23 06:10 cz1999316

DILATE DILATE copied to clipboard

Pure PyTorch implementation

DILATE
DILATE copied to clipboard