mathnet-numerics icon indicating copy to clipboard operation
mathnet-numerics copied to clipboard

Cuda Provider Release

Open mdabros opened this issue 9 years ago • 9 comments

Hello, thanks for a great Library. Is there any roadmap for when the Cuda provider will be released to nuget? Or maybe a guide on how to install the alpha version?

mdabros avatar May 18 '16 14:05 mdabros

I cannot work on the Cuda provider myself, because apparently my own Quadra GPU is too old to support our current implementation. Unfortunately this means this provider is more or less stuck until someone with a newer GPU steps in or at least can help testing it.

cdrnet avatar May 18 '16 19:05 cdrnet

Thanks for the response. I will be more than happy to help you test it on a newer GTX980 GPU. In the longer run I might also be able to step in and help finish the release. Just let me know how I can help.

mdabros avatar May 19 '16 06:05 mdabros

I managed to build the current version and make a simple comparison of the CUDA and MKL provider using matrix multiplication.

System specification: CPU: i7-4710HQ GPU: GTX 980M

Matrix multiplication (Rows x Cols): 100 x 100, Average time over 100 iterations MKL Time (ms): 0.06 CUDA Time (ms): 0.31

Matrix multiplication (Rows x Cols): 1000 x 1000, Average time over 100 iterations MKL Time (ms): 10.23 CUDA Time (ms): 7.2

Matrix multiplication (Rows x Cols): 10000 x 10000, Average time over 100 iterations MKL Time (ms): 8519.1 CUDA Time (ms): 1178.8

I am a bit suprised that the difference between the two is not bigger. Even with a matrix size of 10000x10000 the speedup is only roughly a factor 8. I know that one bottleneck of GPU computing is that the memory has to be copied back and forth to the device. Could this be the reason? I can see that there is a discussion on how this could be optimized in #329 .

mdabros avatar May 19 '16 12:05 mdabros

I remember starting to see gains at a matrix size of 256x256 back when I worked with CUDA a couple of years ago (back at CUDA version 2). To get large gains the copying has to be minimized or hidden by doing one calculation asynchronous while copying the memory. #329 is a really interesting problem but I wouldn't dare doing it unless I had both a good use case and the ability to work full time on it.

eriove avatar May 19 '16 14:05 eriove

Gains at 256x256 also sounds more reasonable to me. On my system however i get:

Matrix multiplication (Rows x Cols): 256 x 256, Average time over 100 iterations MKL Time (ms): 0.17 CUDA Time (ms): 0.85

So still a lot slower using the GPU.

mdabros avatar May 19 '16 16:05 mdabros

Is there a way to set which GPU/Device should be used in the CUDA proider? I have two GPUs in my laptop and want to make sure that I am using the correct one.

mdabros avatar May 22 '16 20:05 mdabros

Hi, any update on this?

screig avatar Jan 30 '17 16:01 screig

@mdabros Did you find a solution to device selection?

Jalict avatar Apr 16 '19 12:04 Jalict

Bumping this again - we seem to now have a few things of which I'm not sure how they work together.

  • https://www.nuget.org/packages/MathNet.Numerics.Providers.CUDA/ which is some kind of nuget package that purports to be at version 5
  • https://numerics.mathdotnet.com/ReleaseNotes-CUDA.html which appears to be release notes on the docs that seems to be extremely out of date
  • https://numerics.mathdotnet.com/api/MathNet.Numerics/Control.htm which has some UseNativeCuda related methods

and that's literally all the relevant info on a site:https://numerics.mathdotnet.com cuda google search.

Does anyone have more info on the state of this nowadays? It looks like I can do Control.UseNativeCUDA(); if I have the cuda nuget package installed at least, but is there any specifics on how to ensure that the CUDA memory copies are optimized or to effectively profile CUDA-utalizing math.net code?

Thanks!!

Cobular avatar Apr 27 '22 21:04 Cobular