ocl-icd icon indicating copy to clipboard operation
ocl-icd copied to clipboard

NVIDIA ICD is being skipped.

Open stolk opened this issue 2 years ago • 1 comments

I have a machine with both an NVIDIA dGPU and an AMD iGPU, as can be seen here:

$ inxi -G
Graphics:
  Device-1: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] driver: nvidia
    v: 535.54.03
  Device-2: AMD Cezanne [Radeon Vega Series / Radeon Mobile Series]
    driver: amdgpu v: kernel
...

And here:

$ drm_info | grep Driver:
├───Driver: amdgpu (AMD GPU) version 3.52.0 (20150101)
├───Driver: nvidia-drm (NVIDIA DRM driver) version 0.0.0 (20160202)

(The vulkaninfo --summary will also list both.)

But when libOpenCL.so is querying the platforms, it seems to try 3 (as strace will open libnvidia-opencl.so) but will not report the nvidia ICD:

$ clinfo -l
Platform #0: Clover
 `-- Device #0: AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.52, 6.3.0-7-generic)
Platform #1: rusticl

Yet, the ICD is there, and points to a valid .so file:

$ ls -al /etc/OpenCL/vendors/
total 20
drwxr-xr-x 2 root root 4096 Sep 13 11:15 .
drwxr-xr-x 3 root root 4096 Sep  8 10:46 ..
-rw-r--r-- 1 root root   19 Jun  9 02:53 mesa.icd
-rw-r--r-- 1 root root   22 Jul 14 13:18 nvidia.icd
-rw-r--r-- 1 root root   22 Jun  9 02:53 rusticl.icd
$ cat /etc/OpenCL/vendors/nvidia.icd 
libnvidia-opencl.so.1
$ ldd /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1 
	linux-vdso.so.1 (0x00007ffefe0f5000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff434fc5000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff433400000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff434fc0000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff434fbb000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff434fb6000)
	/lib64/ld-linux-x86-64.so.2 (0x00007ff4350c4000)

Why is the nvidia opencl driver disregarded?

Here is the output from clinfo with OCL_ICD_DEBUG set to 4:

$ clinfo -l
ocl-icd(../ocl_icd_loader.c:201): _find_num_icds: return: 3/0x3
ocl-icd(../ocl_icd_loader.c:274): _open_driver: return: 1/0x1
ocl-icd(../ocl_icd_loader.c:274): _open_driver: return: 2/0x2
ocl-icd(../ocl_icd_loader.c:274): _open_driver: return: 3/0x3
ocl-icd(../ocl_icd_loader.c:287): _open_drivers: return: 3/0x3
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042331709632/0x7f5e256f60c0
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042331637104/0x7f5e256e4570
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042331637392/0x7f5e256e4690
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042299277520/0x7f5e238080d0
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042299286992/0x7f5e2380a5d0
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042299277616/0x7f5e23808130
ocl-icd(../ocl_icd_loader.c:325): _allocate_platforms: return: 1/0x1
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: cl_khr_icd cl_khr_il_program
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: MESA
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: OpenCL 3.0 
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: rusticl
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: Mesa/X.org
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042064915728/0x7f5e15886d10
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042064912448/0x7f5e15886040
ocl-icd(../ocl_icd_loader.c:310): _get_function_addr: return: 140042064915712/0x7f5e15886d00
ocl-icd(../ocl_icd_loader.c:325): _allocate_platforms: return: 1/0x1
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: cl_khr_icd
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: MESA
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: OpenCL 1.1 Mesa 23.1.7-1ubuntu1
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: Clover
ocl-icd(../ocl_icd_loader.c:351): _malloc_clGetPlatformInfo: return: Mesa
ocl-icd(../ocl_icd_loader.c:1134): clGetPlatformIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: -1/0xffffffffffffffff
Platform #0: Clover
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
 `-- Device #0: AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.52, 6.3.0-7-generic)
Platform #1: rusticl

OS: Ubuntu 23.10

GPUS: NVIDIA + RADEON

stolk avatar Sep 13 '23 20:09 stolk

Running with OCL_ICD_DEBUG=7 gives me more info:

Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped

I need to investigate why this symbol is missing. I believe this to be working under Ubuntu 23.04 but not under Ubuntu 23.10 somehow.

UPDATE: From what I can tell so far: this is a bug in nvidia's 535.54.03 driver, which I think may have been solved in 535.86.05 driver. But somehow Ubuntu 23.10 lags Ubuntu 23.04

stolk avatar Sep 13 '23 20:09 stolk