DILATE icon indicating copy to clipboard operation
DILATE copied to clipboard

Pure PyTorch implementation

Open jdb78 opened this issue 4 years ago • 9 comments

I love the idea behind DILATE and would like to include it in pytorch-forecasting. However, a GPU-only implementation is probably needed for wider adoption. Do you plan on a CUDA or performant pure PyTorch implementation?

jdb78 avatar Nov 03 '20 12:11 jdb78

Hi Jan @jdb78 ! Thank for your interest in DILATE, I have run a run-time comparison with a pure GPU implementation of the soft-DTW part. However, unless my implementation is not optimal, I found it slower than the CPU version (which is also present in the softDTW of M. Cuturi et al https://github.com/mblondel/soft-dtw). I guess this is due to the double loops of the dynamic programming that is quicker in CPU.

vincent-leguen avatar Dec 17 '20 16:12 vincent-leguen

I wonder what happens to the benchmark once you have to not only calculate the loss but backpropagate through the entire network. If this is the main computational burden, I might not care so much about the performance of the loss function itself (sure, you could move data to CPU, calculate and move back to GPU, but guess the copying comes with a significant cost). Could you point me to the GPU implementation?

jdb78 avatar Dec 17 '20 17:12 jdb78

Yes of course, I have sent you an email with the GPU implementation.

vincent-leguen avatar Dec 17 '20 20:12 vincent-leguen

Thanks a lot! Will try to integrate it.

jdb78 avatar Dec 17 '20 21:12 jdb78

Hi @jdb78, I was wondering have you tried integrating it with pytorch-forecasting? Pytorch-forecasting made my life so much easier, plus the idea behind DILATE is really interesting, would love to try out a GPU implementation with Pytorch-forecasting. Could you please point me towards it?

kamal-nain avatar May 06 '22 08:05 kamal-nain

Hello, could you please share me with the GPU implementation? Thank you in advance!

wang-zm18 avatar Sep 10 '22 10:09 wang-zm18

Hi Jan @jdb78 ! Thank for your interest in DILATE, I have run a run-time comparison with a pure GPU implementation of the soft-DTW part. However, unless my implementation is not optimal, I found it slower than the CPU version (which is also present in the softDTW of M. Cuturi et al https://github.com/mblondel/soft-dtw). I guess this is due to the double loops of the dynamic programming that is quicker in CPU.

Hi, Thanks so much for sharing, can you send me a GPU implementations?

ly1112 avatar Sep 15 '22 17:09 ly1112

Yes of course, I have sent you an email with the GPU implementation.

Hi Leguen, could you please send me the gpu version? [email protected]

dongxinyu1030 avatar Nov 17 '22 14:11 dongxinyu1030

Hello, could you please send me the gpu version? [email protected]

cz1999316 avatar Oct 17 '23 06:10 cz1999316