KataGo
                                
                                 KataGo copied to clipboard
                                
                                    KataGo copied to clipboard
                            
                            
                            
                        CUDA 12.3 and 12.4 support (#913)
cast to void** in cudaMalloc calls.
The template version of function cudaMalloc for C++ is actually defined in cuda_device_runtime_api.h. And it is included in c++ source file by cuda_runtime_api.h.
From CUDA 12.3, you should define some macros (Line 152-154 of cuda_runtime_api.h): #if defined(CUDACC_RDC) || defined(CUDACC_EWP) || !defined(CUDACC_RTC) #include "cuda_device_runtime_api.h" #endif /* defined(CUDACC_RDC) || defined(CUDACC_EWP) || !defined(CUDACC_RTC) */
The line "#if defined" and "#endif" are newly added. So the template version of cudaMalloc is not defined when we compile c++ code.
Cast to void** in cudaMalloc calls can solve the problem and keep compatiblility with previous CUDA.
Apologies, I haven't had the bandwidth to install CUDA 12.3 or 12.4 myself and test things out, so I put off responding to this. Do you still think this is a reasonable change? If so, you can just leave it open and I will get to it probably around the time I assemble the next release and put together all the various pieces and test them.
If you are closing this because there is a better way, or because it's unneeded/redundant somehow, then no worries.
I close this request just because zakki give a same solution in his pull request #935. :-)