mgcpp
mgcpp copied to clipboard
A C++ Math Library Based on CUDA
mgcpp internally uses a lot of temporary but cudaMalloc has a really bad performance. Using low latency memory allocators should boost performance a lot. references. - [tcmalloc](http://goog-perftools.sourceforge.net/doc/tcmalloc.html), Google - [THC...
Type of issue - Doc Description - I think opensource project should have code of conduct. 😄 How about add CODE_OF_CONDUCT.md to mgcpp? If this looks good on you, could...
Implement Fast Fourier Transform CUDA kernel or add cuFFT into the library. ### preliminaries - Implement complex type matrix/vector
Make a efficient CUDA micro benchmark framework The current workflow of writing/optimizing CUDA kernels is very difficult because there is no proper, consistent way of measuring the performance of kernels....
Implement adapters for blaze, eigen, uBLAS, plain array, std::vector. adapters for blaze are partially implemented but are not tested and suffer a serious problem: memory padding.
Implement GPUless dummy test mode. By using stateful allocators and stubs, we might be able to run certain tests on systems without a GPU. In order to implement this, a...
Add parallel linear equation solvers, eigen solvers Primarily considering cuSOLVER, need to find a dependable 3rd party library or implement them ourselves
Improve or review the current context_manager implementation context_manager is a global singleton locked with a mutex. I'm afraid the mutex will critically harm scaling under highly parallel context. Profiling the...