mixbench icon indicating copy to clipboard operation
mixbench copied to clipboard

mixbench-cuda README missing info

Open ppbrown opened this issue 2 years ago • 2 comments

the README.md in the cuda directory claims, "Building should be straightforward by using the respective CMakeList.txt file." But it is not straightforward at all.

when I try "cmake .", it errors with
CMAKE_CUDA_ARCHITECTURES must be non-empty if set.

but there is zero instruction about this, and nothing about it in the CMakeList.txt

Please update with relevant information ` I am using:

Ubuntu 22 Cuda compilation tools, release 12.2, V12.2.140

Update: It appears if I force the cuda tools to version 11, it is more forgiving. But if you arent going to update to support the latest, then please update README to say this requires cuda version 11.

ppbrown avatar Sep 19 '23 04:09 ppbrown

Do you have a workaround for this? I've just encountered the same error

SerialVelocity avatar Dec 03 '23 12:12 SerialVelocity

Thanks for the heads up. Yes, this parameter looks that has been introduced to CMake after my past tests.

A workaround is to set its value in the cmake command with the default CC (compute capability) of your device, e.g. for GTX-1060 that would be cmake ../mixbench-cuda -DCMAKE_CUDA_ARCHITECTURES=61

ekondis avatar Jan 05 '24 22:01 ekondis

Ran into this today, setting -DCMAKE_CUDA_ARCHITECTURES=native did not work, but setting it to a specific value like 120 for the RTX 5090 instead of native as mentioned by @ekondis made it work.

However, running the next command - cmake --build ./ - gave me the following error:

In file included from tool/mixbench/mixbench-cuda/main-cuda.cpp:12:
tool/mixbench/mixbench-cuda/lcutil.h: In function ‘void GetDevicePeakInfo(double*, double*, cudaDeviceProp*)’:
tool/mixbench/mixbench-cuda/lcutil.h:77:38: error: ‘struct cudaDeviceProp’ has no member named ‘clockRate’
   77 |         *aGIPS = 1000.0 * deviceProp.clockRate * TotalSPs / (1000.0 * 1000.0 * 1000.0);  // Giga instructions/sec
      |                                      ^~~~~~~~~
tool/mixbench/mixbench-cuda/lcutil.h:78:43: error: ‘struct cudaDeviceProp’ has no member named ‘memoryClockRate’
   78 |         *aGBPS = 2.0 * (double)deviceProp.memoryClockRate * 1000.0 * (double)deviceProp.memoryBusWidth / 8.0;
      |                                           ^~~~~~~~~~~~~~~
tool/mixbench/mixbench-cuda/lcutil.h: In function ‘void StoreDeviceInfo(FILE*)’:
tool/mixbench/mixbench-cuda/lcutil.h:99:67: error: ‘struct cudaDeviceProp’ has no member named ‘clockRate’
   99 |         fprintf(fout, "GPU clock rate:      %d MHz\n", deviceProp.clockRate/1000);
      |                                                                   ^~~~~~~~~
tool/mixbench/mixbench-cuda/lcutil.h:100:67: error: ‘struct cudaDeviceProp’ has no member named ‘memoryClockRate’
  100 |         fprintf(fout, "Memory clock rate:   %d MHz\n", deviceProp.memoryClockRate/1000/2);
      |                                                                   ^~~~~~~~~~~~~~~
gmake[2]: *** [CMakeFiles/mixbench-cuda.dir/build.make:76: CMakeFiles/mixbench-cuda.dir/main-cuda.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/mixbench-cuda.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

This is because I was compiling on CUDA 13.0:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Aug_20_01:58:59_PM_PDT_2025
Cuda compilation tools, release 13.0, V13.0.88
Build cuda_13.0.r13.0/compiler.36424714_0

I wrote the following patch to fix the errors:

diff --git a/mixbench-cuda/lcutil.h b/mixbench-cuda/lcutil.h
index 309ee95..08cad6e 100644
--- a/mixbench-cuda/lcutil.h
+++ b/mixbench-cuda/lcutil.h
@@ -67,15 +67,18 @@ static inline bool IsFP16Supported(void){
 static inline void GetDevicePeakInfo(double *aGIPS, double *aGBPS, cudaDeviceProp *aDeviceProp = NULL){
 	cudaDeviceProp deviceProp;
 	int current_device;
+	int clockRate, memoryClockRate;
 	if( aDeviceProp )
 		deviceProp = *aDeviceProp;
 	else{
 		CUDA_SAFE_CALL( cudaGetDevice(&current_device) );
 		CUDA_SAFE_CALL( cudaGetDeviceProperties(&deviceProp, current_device) );
+		CUDA_SAFE_CALL( cudaDeviceGetAttribute(&clockRate, cudaDevAttrClockRate, current_device) );
+		CUDA_SAFE_CALL( cudaDeviceGetAttribute(&memoryClockRate, cudaDevAttrMemoryClockRate, current_device) );
 	}
 	const int TotalSPs = _ConvertSMVer2Cores(deviceProp.major, deviceProp.minor)*deviceProp.multiProcessorCount;
-	*aGIPS = 1000.0 * deviceProp.clockRate * TotalSPs / (1000.0 * 1000.0 * 1000.0);  // Giga instructions/sec
-	*aGBPS = 2.0 * (double)deviceProp.memoryClockRate * 1000.0 * (double)deviceProp.memoryBusWidth / 8.0;
+	*aGIPS = 1000.0 * clockRate * TotalSPs / (1000.0 * 1000.0 * 1000.0);  // Giga instructions/sec
+	*aGBPS = 2.0 * (double)memoryClockRate * 1000.0 * (double)deviceProp.memoryBusWidth / 8.0;
 }
 
 static inline cudaDeviceProp GetDeviceProperties(void){
@@ -90,14 +93,17 @@ static inline cudaDeviceProp GetDeviceProperties(void){
 static void StoreDeviceInfo(FILE *fout){
 	cudaDeviceProp deviceProp;
 	int current_device, driver_version;
+	int clockRate, memoryClockRate;
 	CUDA_SAFE_CALL( cudaGetDevice(&current_device) );
 	CUDA_SAFE_CALL( cudaGetDeviceProperties(&deviceProp, current_device) );
 	CUDA_SAFE_CALL( cudaDriverGetVersion(&driver_version) );
+	CUDA_SAFE_CALL( cudaDeviceGetAttribute(&clockRate, cudaDevAttrClockRate, current_device) );
+	CUDA_SAFE_CALL( cudaDeviceGetAttribute(&memoryClockRate, cudaDevAttrMemoryClockRate, current_device) );
 	fprintf(fout, "------------------------ Device specifications ------------------------\n");
 	fprintf(fout, "Device:              %s\n", deviceProp.name);
 	fprintf(fout, "CUDA driver version: %d.%d\n", driver_version/1000, driver_version%1000);
-	fprintf(fout, "GPU clock rate:      %d MHz\n", deviceProp.clockRate/1000);
-	fprintf(fout, "Memory clock rate:   %d MHz\n", deviceProp.memoryClockRate/1000/2);
+	fprintf(fout, "GPU clock rate:      %d MHz\n", clockRate/1000);
+	fprintf(fout, "Memory clock rate:   %d MHz\n", memoryClockRate/1000/2);
 	fprintf(fout, "Memory bus width:    %d bits\n", deviceProp.memoryBusWidth);
 	fprintf(fout, "WarpSize:            %d\n", deviceProp.warpSize);
 	fprintf(fout, "L2 cache size:       %d KB\n", deviceProp.l2CacheSize/1024);
diff --git a/mixbench-cuda/mix_kernels_cuda.cu b/mixbench-cuda/mix_kernels_cuda.cu
index e2135f6..9a3b8ac 100644
--- a/mixbench-cuda/mix_kernels_cuda.cu
+++ b/mixbench-cuda/mix_kernels_cuda.cu
@@ -97,7 +97,7 @@ void runbench_warmup(double *cd, long size){
 
 	benchmark_func< short, BLOCK_SIZE, ELEMENTS_PER_THREAD, FUSION_DEGREE, 0, true ><<< dimReducedGrid, dimBlock >>>((short)1, (short*)cd);
 	CUDA_SAFE_CALL( cudaGetLastError() );
-	CUDA_SAFE_CALL( cudaThreadSynchronize() );
+	CUDA_SAFE_CALL( cudaDeviceSynchronize() );
 }
 
 int out_config = 1;
@@ -172,7 +172,7 @@ extern "C" void mixbenchGPU(double *c, long size){
 	CUDA_SAFE_CALL( cudaMemset(cd, 0, size*sizeof(double)) );  // initialize to zeros
 
 	// Synchronize in order to wait for memory operations to finish
-	CUDA_SAFE_CALL( cudaThreadSynchronize() );
+	CUDA_SAFE_CALL( cudaDeviceSynchronize() );
 
 	printf("----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------\n");
 	printf("Experiment ID, Single Precision ops,,,,              Double precision ops,,,,              Half precision ops,,,,                Integer operations,,, \n");

Subsequently deleting the build directory and re-creating it with mkdir build and cmake ../mixbench-cuda/ -DCMAKE_CUDA_ARCHITECTURES=120 gave me the following errors:

-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /.../tool/mixbench/build
[ 33%] Building CXX object CMakeFiles/mixbench-cuda.dir/main-cuda.cpp.o
[ 66%] Building CUDA object CMakeFiles/mixbench-cuda.dir/mix_kernels_cuda.cu.o
nvcc fatal   : Unsupported gpu architecture 'compute_52'
gmake[2]: *** [CMakeFiles/mixbench-cuda.dir/build.make:91: CMakeFiles/mixbench-cuda.dir/mix_kernels_cuda.cu.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/mixbench-cuda.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

I had to edit mixbench-cuda/CMakeLists.txt manually and add my architecture to the CUDA flags:

diff --git a/mixbench-cuda/CMakeLists.txt b/mixbench-cuda/CMakeLists.txt
index b1a10f2..b3b6c5f 100644
--- a/mixbench-cuda/CMakeLists.txt
+++ b/mixbench-cuda/CMakeLists.txt
@@ -7,7 +7,7 @@ include_directories(${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
 string(APPEND CMAKE_CUDA_FLAGS " -Xptxas=-v")
 string(APPEND CMAKE_CUDA_FLAGS " -Wno-deprecated-gpu-targets")
 string(APPEND CMAKE_CUDA_FLAGS " --cudart=static")
-string(APPEND CMAKE_CUDA_FLAGS " -gencode arch=compute_52,code=[sm_52,compute_52] -gencode arch=compute_61,code=compute_61")
+string(APPEND CMAKE_CUDA_FLAGS " -gencode arch=compute_120,code=[sm_120,compute_120]")
 
 # Get version info from git tag
 execute_process(COMMAND git describe --tags --always

After applying the patches, deleting the build directory and re-creating it with mkdir build, running cmake ../mixbench-cuda/ -DCMAKE_CUDA_ARCHITECTURES=120 and cmake --build ./, the compilation finally succeeded.

I'd be happy to make a PR with the above patch for CUDA 13.0 support.

gamemaker1 avatar Nov 20 '25 18:11 gamemaker1

Sure @gamemaker1 . Feel free to submit a PR with your proposed patch. Then we can validate it along the review process.

ekondis avatar Nov 26 '25 22:11 ekondis

@ekondis Thanks for the reply, just submitted #58 and #59 :)

gamemaker1 avatar Nov 27 '25 17:11 gamemaker1