stumpy
stumpy copied to clipboard
Add GPU-MSTUMP: Multi-dimensional STUMP on GPUs
Is there a multi-dimensional time series data analysis using GPU instead Dask Distributed MSTUMPED?
Currently, that feature is not available as there are significant differences between how the code is implemented for CPUs vs GPUs. Our goal was to get one-dimensional GPU-STUMP working first and then to learn from that experience. Next, we'd explore implementing parallel GPU-STUMP (i.e., multiple GPUs on the same server but for one-dimensional data). And then, finally, we'd consider implementing GPU-MSTUMP(ED). Unfortunately, moving things over from CPU to GPUs is non-trivial and we are trying to keep up the readability/maintainability of the code at the cost of reduced performance whenever possible. We certainly welcome any PRs (with help and guidance)!
Out of curiosity:
- What your use case is?
- What is your data size (i.e., how many dimensions and how many data points are there for each time series)?
- Have you already tried MSTUMP or MSTUMPED already?
I will certainly think about this as this certainly within the scope
There is an mSTOMP-GPU implementation that we might be able to learn from
After comparing the MSTUMP code with GPU_STUMP, I think there is a path forward. Specifically, in _mstump we have:
https://github.com/TDAmeritrade/stumpy/blob/955ab965c62d93c98cef4136f1a684529b747f01/stumpy/mstump.py#L809-L826
So, for GPU_MSTUMP, we'll need to replace all of the functions used in this section with their equivalent GPU-based versions so that all computations happen on the GPU. However, we will need to synchronize via the CPU and this is handled by the outermost for-loop.
We may be able to learn some things from this recent paper explores MSTOMP on GPUs:
Exploiting_Reduced_Precision_for_GPU-based_Time_Series_Mining-2.pdf