pyopencl icon indicating copy to clipboard operation
pyopencl copied to clipboard

POCL driver not found when installing in virtualenv

Open jacklovell opened this issue 2 years ago • 7 comments

Describe the bug When pyopencl[pocl] is installed in a virtual environment on a system with no other OpenCL drivers, the POCL ICD is not found. It's necessary to set the environment variable OCL_ICD_VENDORS to <path-topyopencl-install>/.libs to get pyopencl to see PCOL as a driver. This is not documented in the pyopencl documentation, which suggests that simply installing the pyopencl wheel with the pocl extra is sufficient.

To Reproduce Steps to reproduce the behavior:

  1. Create a new virtual environment on a machine without OpenCL installed: python3 -m venv /tmp/venv && /tmp/venv/bin/activate
  2. pip install pyopencl[pocl]
  3. Run python -c 'import pyopencl; pyopencl.get_platforms()'
  4. See error pyopencl._cl.LogicError: clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR

Expected behavior get_platforms() should return a POCL platform.

Environment (please complete the following information):

  • OS: Linux Mint 20.3 (based on Ubuntu 20.04LTS)
  • ICD Loader and version: libOpenCL-cf4d6695 from pyopencl[pocl] wheel
  • ICD and version: libpocl-3a06e60a from pyopencl[pocl] wheel
  • CPU/GPU: Intel Core i7-7600U CPU
  • Python version: 3.8.10
  • PyOpenCL version: 2022.1

Additional context The same issue is present on a Scientific Linux 7 (RHEL7 clone) system with Python 3.7. On this system the Python executable is provided by Anaconda, but a standard virtual environment created using the venv standard library module is used rather than a conda environment. The workaround of setting OCL_ICD_VENDORS still works on this system.

The use case is creating virtualenvs to test code using pyopencl, where root access to install system-side OpenCL is not available and the availability of conda is not guaranteed.

The closest I've got to a portable workaround is:

export OCL_ICD_VENDORS=$(python -c 'import os, pyopencl; print(os.path.join(*pyopencl.__path__, ".libs"))')

But this enforces the use of POCL and so isn't a universal solution as it shouldn't be applied to systems which do already have OpenCL installed globally.

jacklovell avatar Feb 15 '22 15:02 jacklovell

Thanks for the report!

The way this is supposed to work is that the loader that's baked into the pyopencl wheel has that search path baked in:

https://github.com/inducer/pyopencl/blob/0b3d0ef92497e6838eea300b974f385f94cb5100/scripts/build-wheels.sh#L43-L44

That points to this patch:

https://github.com/isuruf/ocl-icd/commit/3862386b51930f95d9ad1089f7157a98165d5a6b.patch

Do you have any sense why that scheme isn't working as intended? (Maybe investigate with strace?)

inducer avatar Feb 15 '22 16:02 inducer

Attached are two straces. The first is running the following command, without specifying the OCL_ICD_VENDORS variable:

strace python -c 'import pyopencl; pyopencl.get_platforms()'

The second is with setting the environment variable:

OCL_ICD_VENDORS=/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs strace python -c 'import pyopencl; pyopencl.get_platforms()'

It looks like the significant difference is from line 3295 of the traces: in the first instance it attempts to open /etc/OpenCL/vendors which fails with ENOENT and then attempts a bunch of paths which end in <string>. In the second case it opens /tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/ and successfully finds the pocl.icd file and in turn the POCL driver.

The use of <string> looks suspiciously like the variable hasn't been defined properly, but I don't know enough about how the system works internally to tell whether this is a problem or not.

cl-strace.log

cl-strace-envset.log

jacklovell avatar Feb 16 '22 10:02 jacklovell

What do you get when you run export OCL_ICD_DEBUG=7 and then start the python interpreter?

isuruf avatar Feb 17 '22 01:02 isuruf

(pocl-venv) jlovell@jlovell-thinkpad:~$ OCL_ICD_DEBUG=7 python -c 'import pyopencl; pyopencl.get_platforms()'
ocl-icd(ocl_icd_loader.c:737): __initClIcd: Reading icd list from '/etc/OpenCL/vendors'
ocl-icd(ocl_icd_loader.c:1029): clGetPlatformIDs: return: -1001/0xfffffffffffffc17
Traceback (most recent call last):
  File "<string>", line 1, in <module>
pyopencl._cl.LogicError: clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR

When I manually specify the path to the ICD directory it looks there instead of in /etc/OpenCL/vendors:

(pocl-venv) jlovell@jlovell-thinkpad:~$ OCL_ICD_DEBUG=7 OCL_ICD_VENDORS=/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/ python -c 'import pyopencl; pyopencl.get_platforms()'
ocl-icd(ocl_icd_loader.c:737): __initClIcd: Reading icd list from '/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/'
ocl-icd(ocl_icd_loader.c:201): _find_num_icds: return: 1/0x1
ocl-icd(ocl_icd_loader.c:232): _open_driver: Considering file '/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs//pocl.icd'
ocl-icd(ocl_icd_loader.c:206): _load_icd: Loading ICD 'libpocl-3a06e60a.so'
ocl-icd(ocl_icd_loader.c:210): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:264): _open_driver: return: 1/0x1
ocl-icd(ocl_icd_loader.c:276): _open_drivers: return: 1/0x1
ocl-icd(ocl_icd_loader.c:232): _open_driver: Considering file '/tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/pocl.icd'
ocl-icd(ocl_icd_loader.c:206): _load_icd: Loading ICD 'libpocl-3a06e60a.so'
ocl-icd(ocl_icd_loader.c:210): _load_icd: ICD[1] loaded
ocl-icd(ocl_icd_loader.c:264): _open_driver: return: 2/0x2
ocl-icd(ocl_icd_loader.c:276): _open_drivers: return: 2/0x2
ocl-icd(ocl_icd_loader.c:433): _find_and_check_platforms: Checking ICD 0/2
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962235520/0x7f205c2e8c80
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
ocl-icd(ocl_icd_loader.c:284): _get_function_addr: Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962236064/0x7f205c2e8ea0
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clGetPlatformInfo
ocl-icd(ocl_icd_loader.c:284): _get_function_addr: Missing global symbol 'clGetPlatformInfo' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962163152/0x7f205c2d71d0
ocl-icd(ocl_icd_loader.c:482): _find_and_check_platforms: Try to load 1 platforms
ocl-icd(ocl_icd_loader.c:304): _allocate_platforms: Requesting allocation for 1 platforms
ocl-icd(ocl_icd_loader.c:314): _allocate_platforms: return: 1/0x1
ocl-icd(ocl_icd_loader.c:489): _find_and_check_platforms: Checking platform 0
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: cl_khr_icd
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: POCL
ocl-icd(ocl_icd_loader.c:559): _find_and_check_platforms: Extension suffix: POCL
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: OpenCL 1.2 pocl 1.3 Release, LLVM 7.0.1, SLEEF, DISTRO, POCL_DEBUG
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: Portable Computing Language
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: The pocl project
ocl-icd(ocl_icd_loader.c:433): _find_and_check_platforms: Checking ICD 1/2
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962235520/0x7f205c2e8c80
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
ocl-icd(ocl_icd_loader.c:284): _get_function_addr: Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962236064/0x7f205c2e8ea0
ocl-icd(ocl_icd_loader.c:281): _get_function_addr: Looking for function clGetPlatformInfo
ocl-icd(ocl_icd_loader.c:284): _get_function_addr: Missing global symbol 'clGetPlatformInfo' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:299): _get_function_addr: return: 139776962163152/0x7f205c2d71d0
ocl-icd(ocl_icd_loader.c:482): _find_and_check_platforms: Try to load 1 platforms
ocl-icd(ocl_icd_loader.c:304): _allocate_platforms: Requesting allocation for 1 platforms
ocl-icd(ocl_icd_loader.c:314): _allocate_platforms: return: 1/0x1
ocl-icd(ocl_icd_loader.c:489): _find_and_check_platforms: Checking platform 0
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: cl_khr_icd
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: POCL
ocl-icd(ocl_icd_loader.c:559): _find_and_check_platforms: Extension suffix: POCL
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: OpenCL 1.2 pocl 1.3 Release, LLVM 7.0.1, SLEEF, DISTRO, POCL_DEBUG
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: Portable Computing Language
ocl-icd(ocl_icd_loader.c:340): _malloc_clGetPlatformInfo: return: The pocl project
ocl-icd(ocl_icd_loader.c:387): _sort_platforms: Nb platefroms: 2
ocl-icd(ocl_icd_loader.c:398): _sort_platforms: Platform sorted by GPU, CPU, DEV
ocl-icd(ocl_icd_loader.c:793): __initClIcd: 2 valid vendor(s)!
ocl-icd(ocl_icd_loader.c:1025): clGetPlatformIDs: Entering

Same behaviour in the interactive python interpreter.

Manually setting PYOPENCL_HOME before starting python also fails in the same way as if it is not set. So it looks like the environment variable isn't getting picked up.

jacklovell avatar Feb 17 '22 10:02 jacklovell

Definitely using the wheel-provided libOpenCL too, so it should have the patch you mentioned. Grepping that SO does indicate it has PYOPENCL_HOME inside the library.

(pocl-venv) jlovell@jlovell-thinkpad:~$ ldd /tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/_cl.cpython-38-x86_64-linux-gnu.so 
	linux-vdso.so.1 (0x00007ffd960ea000)
	libOpenCL-cf4d6695.so.1.0.0 => /tmp/pocl-venv/lib/python3.8/site-packages/pyopencl/.libs/libOpenCL-cf4d6695.so.1.0.0 (0x00007f1b1e233000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1b1e024000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1b1ded5000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1b1deba000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1b1de97000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1b1dca5000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1b1dc9d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1b1e37a000

jacklovell avatar Feb 17 '22 10:02 jacklovell

That's mysterious. Why does it say "missing global symbol" and then return an address for it? And why does this work on other systems?

inducer avatar Feb 17 '22 18:02 inducer

I've been able to reproduce this using Github Actions: compare https://github.com/cherab/core/runs/5290607772?check_suite_focus=true where I didn't properly set the OCL_ICD_VENDORS environment variable for the job with https://github.com/cherab/core/runs/5291207783?check_suite_focus=true where I managed to do it correctly. So it should be possible for you to reproduce this too for testing.

I'm afraid I don't know enough about the OpenCL loader to speculate on why it's doing this on some systems but not others.

jacklovell avatar Feb 22 '22 17:02 jacklovell

Update: the workaround of manually setting OCL_ICD_VENDORS no longer works with pyopencl 2022.2.3

jacklovell avatar Oct 13 '22 13:10 jacklovell

Could you use some of the same troubleshooting techniques (strace, ldd) to see why that might be happening?

inducer avatar Oct 14 '22 01:10 inducer

Fixed in https://github.com/inducer/pyopencl/pull/635

isuruf avatar Oct 14 '22 04:10 isuruf

Seems to work with the wheels in the #635 build artifacts, thanks!

Took a bit of trial and error, as I hadn't realised that the pocl ICD was added to site-packages/pyopencl/.libs by pocl-binary-distribution and not pyopencl, then got confused as to why those files were missing after uninstalling the previous version of pyopencl and deleting the pyopencl directory entirely in site-packages. Reinstalling pocl-binary-distribution along with the patched version of pyopencl fixed things.

I can also confirm that it's no longer necessary with #635 to manually set OCL_ICD_VENDORS for the ICD to be picked up.

jacklovell avatar Oct 14 '22 09:10 jacklovell