somoclu icon indicating copy to clipboard operation
somoclu copied to clipboard

Reimplement GPU kernels in CUDA to replace Thrust

Open peterwittek opened this issue 9 years ago • 2 comments

The Thrust kernels are somewhat inefficient for the multidimensional data structures. Testing is in the cuda_opt branch.

peterwittek avatar Sep 27 '15 12:09 peterwittek

Is there a working Thrust-Free version of the Somoclu Python package available? I am experiencing an issue similar to #89, where the GPU training is slower than the CPU. While I would think that it might benefit from scaling, If I try to use more than a few hundred points, it throws the following error:

terminate called after throwing an instance of 'thrust::system::system_error' what(): device free failed: an illegal memory access was encountered Aborted

I am currently doing this with a 512x512 map w/ a feature vector length of 956. I can train with 256 points or so, but trying to 512 fails. With 256 points it is slightly (3 seconds) faster than the dense CPU kernel.

I am currently running Debian Buster, with a M1000M Quadro GPU using the Nvidia 410.48 drivers and CUDA 10.0 installed via the cuda_10.0.130_410.48_linux.run file provided by NVIDIA.

Thanks for any help you might be able to give.

zclandry avatar Apr 09 '19 16:04 zclandry

...any status on this? It seems like Thrust implementation has been taken over by CUDA (see https://www.reddit.com/r/cpp/comments/7erub1/anybody_still_using_thrust/ ), so I'm curious if there's a way to call the updated routines in CUDA without having to fundamentally rewrite the kernels...

My understanding is that this is why the behavior in #89 happens, so it'd be great to use more modern GPU functionality (especially re: memory management).

espg avatar Sep 10 '19 22:09 espg