error with 7a068df
OS: Windows 11 Compiler: Visual Studio 2022 MSVC OpenCL SDK: KhronosGroup OpenCL SDK(https://github.com/KhronosGroup/OpenCL-Guide/blob/main/chapters/getting_started_windows.md)
mixbench-ocl
mixbench-ocl () Use "-h" argument to see available options ------------------------ Device specifications ------------------------ Platform: NVIDIA CUDA Device: NVIDIA GeForce RTX 4080/NVIDIA Corporation Driver version: 526.98 Address bits: 64 GPU clock rate: 2505 MHz Total global mem: 16375 MB Max allowed buffer: 4093 MB OpenCL version: OpenCL 3.0 CUDA Total CUs: 76
Buffer size: 256MB Workgroup size: 256 Elements per workitem: 8 Workitem fusion degree: 4 Workitem stride: NDRange Buffer allocation: Device allocated Timer: CL event based Warning: Half precision computations are not supported Loading kernel source file... Precompilation of kernels... OpenCL error in file 'G:\git\mixbench\mixbench-opencl\mix_kernels_ocl.cpp' in line 89 : Code -30.
Thank you for reporting this. This refers to OpenCL kernel code compilation error (CL_INVALID_VALUE: -30) but it is not clear what bugs it.
Do other opencl programs run correctly? e.g. https://github.com/krrishnarraj/clpeak
Thank you for reporting this. This refers to OpenCL kernel code compilation error (CL_INVALID_VALUE: -30) but it is not clear what bugs it.
Do other opencl programs run correctly? e.g. https://github.com/krrishnarraj/clpeak
clpeak is ok here(build and run):
Platform: NVIDIA CUDA
Device: NVIDIA GeForce RTX 4080
Driver version : 526.98 (Win64)
Compute units : 76
Clock frequency : 2505 MHz
Global memory bandwidth (GBPS)
float : 612.28
float2 : 631.80
float4 : 639.96
float8 : 648.81
float16 : 656.37
Single-precision compute (GFLOPS)
float : 52304.35
float2 : 51823.82
float4 : 52095.66
float8 : 51354.73
float16 : 51322.97
No half precision support! Skipped
Double-precision compute (GFLOPS)
double : 853.48
double2 : 852.69
double4 : 850.52
double8 : 846.52
double16 : 838.56
Integer compute (GIOPS)
int : 26660.84
int2 : 26533.69
int4 : 26473.44
int8 : 26544.63
int16 : 26350.34
Integer compute Fast 24bit (GIOPS)
int : 26459.70
int2 : 26463.14
int4 : 26457.42
int8 : 26354.03
int16 : 25947.06
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 15.07
enqueueReadBuffer : 13.99
enqueueWriteBuffer non-blocking : 15.06
enqueueReadBuffer non-blocking : 14.00
enqueueMapBuffer(for read) : 21.76
memcpy from mapped ptr : 22.84
enqueueUnmap(after write) : 26.33
memcpy to mapped ptr : 22.43
Kernel launch latency : 8.61 us
There is not problem mixbench 0.04 too.