mathnet-numerics
mathnet-numerics copied to clipboard
Cuda Provider Release
Hello, thanks for a great Library. Is there any roadmap for when the Cuda provider will be released to nuget? Or maybe a guide on how to install the alpha version?
I cannot work on the Cuda provider myself, because apparently my own Quadra GPU is too old to support our current implementation. Unfortunately this means this provider is more or less stuck until someone with a newer GPU steps in or at least can help testing it.
Thanks for the response. I will be more than happy to help you test it on a newer GTX980 GPU. In the longer run I might also be able to step in and help finish the release. Just let me know how I can help.
I managed to build the current version and make a simple comparison of the CUDA and MKL provider using matrix multiplication.
System specification: CPU: i7-4710HQ GPU: GTX 980M
Matrix multiplication (Rows x Cols): 100 x 100, Average time over 100 iterations MKL Time (ms): 0.06 CUDA Time (ms): 0.31
Matrix multiplication (Rows x Cols): 1000 x 1000, Average time over 100 iterations MKL Time (ms): 10.23 CUDA Time (ms): 7.2
Matrix multiplication (Rows x Cols): 10000 x 10000, Average time over 100 iterations MKL Time (ms): 8519.1 CUDA Time (ms): 1178.8
I am a bit suprised that the difference between the two is not bigger. Even with a matrix size of 10000x10000 the speedup is only roughly a factor 8. I know that one bottleneck of GPU computing is that the memory has to be copied back and forth to the device. Could this be the reason? I can see that there is a discussion on how this could be optimized in #329 .
I remember starting to see gains at a matrix size of 256x256 back when I worked with CUDA a couple of years ago (back at CUDA version 2). To get large gains the copying has to be minimized or hidden by doing one calculation asynchronous while copying the memory. #329 is a really interesting problem but I wouldn't dare doing it unless I had both a good use case and the ability to work full time on it.
Gains at 256x256 also sounds more reasonable to me. On my system however i get:
Matrix multiplication (Rows x Cols): 256 x 256, Average time over 100 iterations MKL Time (ms): 0.17 CUDA Time (ms): 0.85
So still a lot slower using the GPU.
Is there a way to set which GPU/Device should be used in the CUDA proider? I have two GPUs in my laptop and want to make sure that I am using the correct one.
Hi, any update on this?
@mdabros Did you find a solution to device selection?
Bumping this again - we seem to now have a few things of which I'm not sure how they work together.
- https://www.nuget.org/packages/MathNet.Numerics.Providers.CUDA/ which is some kind of nuget package that purports to be at version 5
- https://numerics.mathdotnet.com/ReleaseNotes-CUDA.html which appears to be release notes on the docs that seems to be extremely out of date
- https://numerics.mathdotnet.com/api/MathNet.Numerics/Control.htm which has some UseNativeCuda related methods
and that's literally all the relevant info on a site:https://numerics.mathdotnet.com cuda google search.
Does anyone have more info on the state of this nowadays? It looks like I can do Control.UseNativeCUDA(); if I have the cuda nuget package installed at least, but is there any specifics on how to ensure that the CUDA memory copies are optimized or to effectively profile CUDA-utalizing math.net code?
Thanks!!