loopy
loopy copied to clipboard
PyOpenCL target: Overflow large argument counts into SVM struct
Needs:
- [ ] https://github.com/inducer/pyopencl/pull/452
With
- https://github.com/pocl/pocl/pull/1069
- https://github.com/inducer/pyopencl/pull/452
the following passes for me:
LOOPY_NO_CACHE=1 pycl test_target.py 'test_passing_bajillions_of_svm_args(cl._csc)'
Let me know if you can reproduce that.
I confirmed that this PR works on POCL-pthreads and Nvidia CL when also using https://github.com/inducer/pyopencl/pull/452 (and https://github.com/pocl/pocl/pull/1069 in the POCL-pthreads case).
With POCL-cuda and https://github.com/pocl/pocl/pull/1067 in addition to https://github.com/pocl/pocl/pull/1069 it currently fails when trying to access an array on the host after executing a GPU kernel.
The following code reproduces this:
import numpy as np
import pyopencl as cl
import pyopencl.array as cla
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
alloc = cl.tools.SVMAllocator(ctx, cl.svm_mem_flags.READ_WRITE, queue=queue)
ary = cla.zeros(queue, 20, np.float64, allocator=alloc)
ary.fill(17) # runs as GPU kernel
ary.copy() # segfaults here
The full output log is (with OCL_ICD_DEBUG=7
and POCL_DEBUG=all
):
$ python svm2.py ocl-icd(ocl_icd_loader.c:776): __initClIcd: Reading icd list from '/home/mdiener/Work/pocl/install/etc/OpenCL/vendors/' ocl-icd(ocl_icd_loader.c:234): _find_num_icds: return: 1/0x1 ocl-icd(ocl_icd_loader.c:265): _open_driver: Considering file '/home/mdiener/Work/pocl/install/etc/OpenCL/vendors//pocl.icd' ocl-icd(ocl_icd_loader.c:239): _load_icd: Loading ICD '/home/mdiener/Work/pocl/install/lib/libpocl.so.2.9.0' ocl-icd(ocl_icd_loader.c:243): _load_icd: ICD[0] loaded ocl-icd(ocl_icd_loader.c:297): _open_driver: return: 1/0x1 ocl-icd(ocl_icd_loader.c:320): _open_drivers: return: 1/0x1 ocl-icd(ocl_icd_loader.c:477): _find_and_check_platforms: Checking ICD 0/1 ocl-icd(ocl_icd_loader.c:325): _get_function_addr: Looking for function clGetExtensionFunctionAddress ocl-icd(ocl_icd_loader.c:343): _get_function_addr: return: 140254412730417/0x7f8f8672f031 ocl-icd(ocl_icd_loader.c:325): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR ocl-icd(ocl_icd_loader.c:328): _get_function_addr: Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped ocl-icd(ocl_icd_loader.c:343): _get_function_addr: return: 140254412732259/0x7f8f8672f763 ocl-icd(ocl_icd_loader.c:325): _get_function_addr: Looking for function clGetPlatformInfo ocl-icd(ocl_icd_loader.c:328): _get_function_addr: Missing global symbol 'clGetPlatformInfo' in ICD, should be skipped ocl-icd(ocl_icd_loader.c:343): _get_function_addr: return: 140254412570414/0x7f8f86707f2e ocl-icd(ocl_icd_loader.c:526): _find_and_check_platforms: Try to load 1 platforms ocl-icd(ocl_icd_loader.c:348): _allocate_platforms: Requesting allocation for 1 platforms ocl-icd(ocl_icd_loader.c:358): _allocate_platforms: return: 1/0x1 ocl-icd(ocl_icd_loader.c:533): _find_and_check_platforms: Checking platform 0 ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: cl_khr_icd cl_pocl_content_size ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: POCL ocl-icd(ocl_icd_loader.c:603): _find_and_check_platforms: Extension suffix: POCL ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: FULL_PROFILE ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: OpenCL 3.0 PoCL 3.1-pre cuda-svm-0-g8e6dd829 Linux, Debug+Asserts, RELOC, SPIR, LLVM 14.0.6, SLEEF, CUDA, POCL_DEBUG ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: Portable Computing Language ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: The pocl project ocl-icd(ocl_icd_loader.c:431): _sort_platforms: Nb platefroms: 1 ocl-icd(ocl_icd_loader.c:824): __initClIcd: 1 valid vendor(s)! ocl-icd(ocl_icd_loader.c:1060): clGetPlatformIDs: Entering ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering ** Final POCL_DEBUG flags: FFFFFFFFFFFFFFFF [2022-07-06 04:18:58.406713745]POCL: in fn pocl_install_sigfpe_handler at line 229: | GENERAL | Installing SIGFPE handler... [2022-07-06 04:18:58.522237845]POCL: in fn pocl_cuda_init at line 397: | GENERAL | [CUDA] GPU architecture = sm_35 [2022-07-06 04:18:58.522300352]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/nvvm/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522331286]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/nvvm/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522343606]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522360890]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522373642]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522387107]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522400285]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/cuda/nvvm/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522415089]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/cuda/nvvm/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522427965]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/cuda/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522440770]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/cuda/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522453608]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/cuda/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522466226]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/cuda/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522479006]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/nvvm/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522491669]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/nvvm/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522503989]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522517051]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522529598]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522542592]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522555310]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/lib/nvvm/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522568743]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/lib/nvvm/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522580592]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522596110]POCL: in fn findLibDevice at line 572: | CUDA | found libdevice at '/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0 ocl-icd(ocl_icd_loader.c:1140): clCreateContext: Entering [2022-07-06 04:18:58.630074880]POCL: in fn POclCreateCommandQueue at line 47: | GENERAL | Create Command queue on device 1 [2022-07-06 04:18:58.630114638]POCL: in fn void pocl_llvm_create_context(cl_context) at line 379: | LLVM | creating LLVM context ocl-icd(ocl_icd_loader.c:1149): clCreateContext: return: 94104096093728/0x559651383620 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3259): clCreateCommandQueueWithProperties: Entering [2022-07-06 04:18:58.630255155]POCL: in fn POclCreateCommandQueue at line 47: | GENERAL | Create Command queue on device 1 [2022-07-06 04:18:58.630266146]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 2 ocl-icd(ocl_icd_loader_gen.c:3268): clCreateCommandQueueWithProperties: return: 94104096102768/0x559651385970 ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.630350961]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 3 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1765): clRetainCommandQueue: Entering [2022-07-06 04:18:58.630446057]POCL: in fn POclRetainCommandQueue at line 33: | REFCOUNTS | Retain Command Queue 0x559651385970 : 2 ocl-icd(ocl_icd_loader_gen.c:1771): clRetainCommandQueue: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3310): clSVMAlloc: Entering SVM cuMemAllocManaged 160 before write 0 0x4204060000 after write 42.000000 [2022-07-06 04:18:58.630824524]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 4 [2022-07-06 04:18:58.630836034]POCL: in fn POclSVMAlloc at line 114: | MEMORY | Allocated SVM: PTR 0x4204060000, SIZE 160, FLAGS 1 ocl-icd(ocl_icd_loader_gen.c:3316): clSVMAlloc: return: 283535343616/0x4204060000 ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3379): clEnqueueSVMMemFill: Entering [2022-07-06 04:18:58.631313572]POCL: in fn POclRetainCommandQueue at line 33: | REFCOUNTS | Retain Command Queue 0x559651385970 : 3 [2022-07-06 04:18:58.631323842]POCL: in fn pocl_create_event at line 514: | EVENTS | Created event 0x55965138ddb0 / ID 1 / Command svm_memfill [2022-07-06 04:18:58.631333776]POCL: in fn pocl_create_command_struct at line 630: | EVENTS | event pointer provided [2022-07-06 04:18:58.631342637]POCL: in fn pocl_create_command_struct at line 650: | EVENTS | Created command struct: CMD 0x55965138dcd0 (event 1 / 0x55965138ddb0, type: svm_memfill) [2022-07-06 04:18:58.631353096]POCL: in fn pocl_command_enqueue at line 1061: | EVENTS | In-order Q; adding event syncs [2022-07-06 04:18:58.631362451]POCL: in fn pocl_command_enqueue at line 1105: | EVENTS | Pushed Event 1 to CQ 5. [2022-07-06 04:18:58.631372020]POCL: in fn pocl_update_event_queued at line 1922: | EVENTS | Event queued: 1 [2022-07-06 04:18:58.631394026]POCL: in fn pocl_update_event_submitted at line 1942: | EVENTS | Event submitted: 1 BEFORE MEMFILL 160 0x4204060000 AFTER MEMFILL ocl-icd(ocl_icd_loader_gen.c:3385): clEnqueueSVMMemFill: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2005): clCreateProgramWithSource: Entering [2022-07-06 04:18:58.632158085]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 5 ocl-icd(ocl_icd_loader_gen.c:2014): clCreateProgramWithSource: return: 94104096146192/0x559651390310 ocl-icd(ocl_icd_loader_gen.c:2066): clBuildProgram: Entering [2022-07-06 04:18:58.632208081]POCL: in fn compile_and_link_program at line 691: | LLVM | building program with options -I /shared/home/mdiener/Work/pyopencl/pyopencl/cl [2022-07-06 04:18:58.632218813]POCL: in fn compile_and_link_program at line 713: | LLVM | building program for 0 devs with options -I /shared/home/mdiener/Work/pyopencl/pyopencl/cl [2022-07-06 04:18:58.632228799]POCL: in fn compile_and_link_program at line 717: | LLVM | BUILDING for device: Tesla K40c [2022-07-06 04:18:58.632238807]POCL: in fn pocl_driver_build_source at line 511: | LLVM | building from sources for device 0 [2022-07-06 04:18:58.632438048]POCL: in fn int pocl_llvm_build_program(cl_program, unsigned int, cl_uint, _cl_program* const*, const char**, int) at line 382: | LLVM | all build options: -mllvm --nvptx-short-ptr -Dcl_khr_int64 -DPOCL_DEVICE_ADDRESS_BITS=64 -D__USE_CLANG_OPENCL_C_H -xcl -Dinline= -I. -cl-kernel-arg-info -I /shared/home/mdiener/Work/pyopencl/pyopencl/cl -D__ENDIAN_LITTLE__=1 -DCL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE=0 -D__OPENCL_VERSION__=120 -cl-std=CL1.2 -D__OPENCL_C_VERSION__=120 -Dcl_khr_byte_addressable_store=1 -Dcl_khr_global_int32_base_atomics=1 -Dcl_khr_global_int32_extended_atomics=1 -Dcl_khr_local_int32_base_atomics=1 -Dcl_khr_local_int32_extended_atomics=1 -Dcl_khr_fp64=1 -Dcl_khr_int64_base_atomics=1 -Dcl_khr_int64_extended_atomics=1 -Dcl_nv_device_attribute_query=1 -Dcl_khr_spir=1 -cl-ext=-all,+cl_khr_byte_addressable_store,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_fp64,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_nv_device_attribute_query,+cl_khr_spir -fno-builtin -triple=nvptx64 -target-cpu sm_35 ocl-icd(ocl_icd_loader_gen.c:2072): clBuildProgram: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2081): clGetProgramInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2087): clGetProgramInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2081): clGetProgramInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2087): clGetProgramInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2097): clGetProgramBuildInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2103): clGetProgramBuildInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2097): clGetProgramBuildInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2103): clGetProgramBuildInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2110): clCreateKernel: Entering ocl-icd(ocl_icd_loader_gen.c:2119): clCreateKernel: return: 94104096155376/0x5596513926f0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.697513230]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 6 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering [2022-07-06 04:18:58.697573667]POCL: in fn POclReleaseContext at line 47: | REFCOUNTS | Release Context ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.698408613]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 6 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering [2022-07-06 04:18:58.698444839]POCL: in fn POclReleaseContext at line 47: | REFCOUNTS | Release Context ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.698467371]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 6 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering [2022-07-06 04:18:58.698561830]POCL: in fn POclReleaseContext at line 47: | REFCOUNTS | Release Context ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.698612903]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 6 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering [2022-07-06 04:18:58.698642850]POCL: in fn POclReleaseContext at line 47: | REFCOUNTS | Release Context ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2050): clReleaseProgram: Entering [2022-07-06 04:18:58.699081032]POCL: in fn POclReleaseProgram at line 50: | REFCOUNTS | Release program 0x559651390310, new refcount: 1, kernel #: 1 ocl-icd(ocl_icd_loader_gen.c:2056): clReleaseProgram: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2194): clGetKernelWorkGroupInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2200): clGetKernelWorkGroupInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3441): clSetKernelArgSVMPointer: Entering [2022-07-06 04:18:58.699270966]POCL: in fn POclSetKernelArgSVMPointer at line 43: | GENERAL | Setting kernel ARG 0 to SVM 0x4204060000 ocl-icd(ocl_icd_loader_gen.c:3447): clSetKernelArgSVMPointer: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2163): clSetKernelArg: Entering [2022-07-06 04:18:58.699293930]POCL: in fn POclSetKernelArg at line 107: | GENERAL | Kernel fill || SetArg idx 1 || long || Local 0 || Size 8 || Value 0x7ffc31e941a8 || Pointer (nil) || *(uint32*)Value: 0 || *(uint64*)Value: 0 || Hex Value: 00000000 00000000 ocl-icd(ocl_icd_loader_gen.c:2169): clSetKernelArg: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2163): clSetKernelArg: Entering [2022-07-06 04:18:58.699312532]POCL: in fn POclSetKernelArg at line 107: | GENERAL | Kernel fill || SetArg idx 2 || double || Local 0 || Size 8 || Value 0x7ffc31e941a8 || Pointer (nil) || *(uint32*)Value: 0 || *(uint64*)Value: 4625478292286210048 || Hex Value: 00000000 00003140 ocl-icd(ocl_icd_loader_gen.c:2169): clSetKernelArg: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2163): clSetKernelArg: Entering [2022-07-06 04:18:58.699330226]POCL: in fn POclSetKernelArg at line 107: | GENERAL | Kernel fill || SetArg idx 3 || long || Local 0 || Size 8 || Value 0x7ffc31e941a8 || Pointer (nil) || *(uint32*)Value: 0 || *(uint64*)Value: 20 || Hex Value: 14000000 00000000 ocl-icd(ocl_icd_loader_gen.c:2169): clSetKernelArg: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2507): clEnqueueNDRangeKernel: Entering [2022-07-06 04:18:58.699356022]POCL: in fn POclEnqueueNDRangeKernel at line 221: | GENERAL | Queueing kernel fill with local size 32 x 1 x 1 group sizes 1 x 1 x 1... [2022-07-06 04:18:58.699367414]POCL: in fn POclRetainCommandQueue at line 33: | REFCOUNTS | Retain Command Queue 0x559651385970 : 4 [2022-07-06 04:18:58.699376473]POCL: in fn pocl_create_event at line 514: | EVENTS | Created event 0x55965144f190 / ID 2 / Command ndrange_kernel [2022-07-06 04:18:58.699384689]POCL: in fn pocl_create_command_struct at line 630: | EVENTS | event pointer provided [2022-07-06 04:18:58.699394485]POCL: in fn pocl_create_event_sync at line 530: | EVENTS | create event sync: waiting 2 , notifier 1 [2022-07-06 04:18:58.699403512]POCL: in fn pocl_create_command_struct at line 650: | EVENTS | Created command struct: CMD 0x559650ddf460 (event 2 / 0x55965144f190, type: ndrange_kernel) [2022-07-06 04:18:58.699413923]POCL: in fn POclRetainKernel at line 33: | REFCOUNTS | Retain Kernel 0x5596513926f0 : 2 [2022-07-06 04:18:58.699422964]POCL: in fn pocl_command_enqueue at line 1061: | EVENTS | In-order Q; adding event syncs [2022-07-06 04:18:58.699429314]POCL: in fn pocl_create_event_sync at line 530: | EVENTS | create event sync: waiting 2 , notifier 1 [2022-07-06 04:18:58.699438005]POCL: in fn pocl_create_event_sync at line 543: | EVENTS | Skipping event sync creation [2022-07-06 04:18:58.699446350]POCL: in fn pocl_create_event_sync at line 530: | EVENTS | create event sync: waiting 2 , notifier 1 [2022-07-06 04:18:58.699454328]POCL: in fn pocl_create_event_sync at line 543: | EVENTS | Skipping event sync creation [2022-07-06 04:18:58.699462708]POCL: in fn pocl_command_enqueue at line 1105: | EVENTS | Pushed Event 2 to CQ 5. [2022-07-06 04:18:58.699470964]POCL: in fn pocl_update_event_queued at line 1922: | EVENTS | Event queued: 2 [2022-07-06 04:18:58.699489571]POCL: in fn pocl_update_event_submitted at line 1942: | EVENTS | Event submitted: 2 ocl-icd(ocl_icd_loader_gen.c:2513): clEnqueueNDRangeKernel: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1765): clRetainCommandQueue: Entering [2022-07-06 04:18:58.700081115]POCL: in fn POclRetainCommandQueue at line 33: | REFCOUNTS | Retain Command Queue 0x559651385970 : 5 ocl-icd(ocl_icd_loader_gen.c:1771): clRetainCommandQueue: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3310): clSVMAlloc: Entering SVM cuMemAllocManaged 160 before write 0 0x4204061000 Bus error (core dumped)
i.e., it seems to successfully allocate memory with clSVMAlloc/cuMemAllocManaged
, but the resulting memory appears to be inaccessible from the host.
I obviously can't guarantee that that's what at issue here, but I suspect you'll need https://github.com/pocl/pocl/pull/1069 (or another fix for the same issue) in order to allow this to work. I'm actually sort of surprised pocl-pthreads worked.
If that doesn't help, a backtrace and potentially the first meaningful thing flagged by valgrind would be of use.
I obviously can't guarantee that that's what at issue here, but I suspect you'll need pocl/pocl#1069 (or another fix for the same issue) in order to allow this to work. I'm actually sort of surprised pocl-pthreads worked.
I'm sorry, I should have been clearer. I used https://github.com/inducer/pyopencl/pull/452 and https://github.com/pocl/pocl/pull/1069 (for POCL-pthreads and POCL-cuda) for all tests. Note that my failing example above doesn't even use loopy (the loopy test in this PR just exposed it), so this PR might be the wrong location to track this issue.
If that doesn't help, a backtrace and potentially the first meaningful thing flagged by valgrind would be of use.
Here is a part of the backtrace at the point of the crash. It doesn't appear to be too useful though.
SVM cuMemAllocManaged 160
before write 0 0x4204061000
--Type <RET> for more, q to quit, c to continue without paging--q
Thread 1 "python" received signal SIGBUS, Bus error.
0x00007fff8510e39c in pocl_cuda_svm_alloc (dev=0x555555f27020, flags=1, size=160)
at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1895
1895 ((double*)dptr)[0] = 42.0;
(gdb) bt
#0 0x00007fff8510e39c in pocl_cuda_svm_alloc (dev=0x555555f27020, flags=1, size=160)
at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1895
#1 0x00007fff91c7cfb6 in POclSVMAlloc (context=0x5555564b86d0, flags=1, size=160, alignment=128)
at /home/mdiener/Work/pocl/lib/CL/clSVMAlloc.c:98
#2 0x00007fff9220638f in clSVMAlloc () from /shared/home/mdiener/Work/emirge/miniforge3/envs/poclbuild/lib/libOpenCL.so.1
#3 0x00007fff922bb337 in pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*>::execute<pybind11::class_<pyopencl::svm_allocation>, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v, 0>(pybind11::class_<pyopencl::svm_allocation>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*)#1}, void, pybind11::detail::value_and_holder&, std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v>(pybind11::class_<pyopencl::svm_allocation>&&, void (*)(pybind11::detail::value_and_holder&, std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) ()
from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
#4 0x00007fff922494a6 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) ()
from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
#5 0x000055555569850c in cfunction_call (func=0x7fff9240e1b0, args=<optimized out>, kwargs=<optimized out>)
at /usr/local/src/conda/python-3.10.5/Objects/methodobject.c:543
#6 0x00005555556a6db9 in _PyObject_Call (kwargs=<optimized out>, args=0x7fff91e2cdc0, callable=0x7fff9240e1b0, tstate=0x55555591be10)
at /usr/local/src/conda/python-3.10.5/Objects/call.c:305
[...]
I modified https://github.com/pocl/pocl/pull/1067 such that it tries to write to the just allocated buffer after a successful cuMemAllocManaged
(this just triggers the issue a bit earlier). What happens is that cuMemAllocManaged
appears to succeed, but the returned memory can not be read or written on the host. This only appears after a kernel has run on the GPU. As far as I can see, no SVM allocated memory is freed at all for the example code I used.
This is on koelsch with a Tesla K40c.
With CU_MEM_ATTACH_GLOBAL
, I don't think you have a guarantee that the memory should be accessible from the host. Also, since you seem to attribute the crash in the sample code from https://github.com/inducer/loopy/pull/642#issuecomment-1175764168 to host-side access, could you explain where you think that host-side access is happening? (I don't see it. A backtrace would help.)
Btw, I agree that this discussion does not have much to do with Loopy. Maybe let's continue the discussion here: https://github.com/inducer/pyopencl/pull/452.
(Continuing the discussion here for a bit since I got the loopy test in this PR running with the change outlined below)
With
CU_MEM_ATTACH_GLOBAL
, I don't think you have a guarantee that the memory should be accessible from the host.
Hmm, that is interesting. Based on your comment, I tried CU_MEM_ATTACH_HOST
, and with that change, my test case above and the test in this PR run successfully on POCL-cuda. I'm not sure I understand why though, based on the documentation:
flags
specifies the default stream association for this allocation.flags
must be one ofCU_MEM_ATTACH_GLOBAL
orCU_MEM_ATTACH_HOST
. IfCU_MEM_ATTACH_GLOBAL
is specified, then this memory is accessible from any stream on any device. IfCU_MEM_ATTACH_HOST
is specified, then the allocation should not be accessed from devices that have a zero value for the device attributeCU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS
; an explicit call tocuStreamAttachMemAsync
will be required to enable access on such devices.
The device I'm running on (K40c) returns 0
for CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS
(but 1
for CU_DEVICE_ATTRIBUTE_MANAGED_MEMORY
).
I found another fix (workaround?) in https://github.com/pocl/pocl/pull/1067/commits/03ffc7146f425bee6e6345dfe4208d095ddd7e7b which just uses CUDA functions for the memfill operation. With that fix, my simple test and the test in this PR also work.
Also, since you seem to attribute the crash in the sample code from #642 (comment) to host-side access, could you explain where you think that host-side access is happening? (I don't see it. A backtrace would help.)
Here is a backtrace with my debug cruft removed:
--Type <RET> for more, q to quit, c to continue without paging--
Thread 1 "python" received signal SIGBUS, Bus error.
0x00007fff91d71bab in pocl_fill_aligned_buf_with_pattern (ptr=0x4204061000, offset=0, size=160, pattern=0x55555650d600, pattern_size=1)
at /home/mdiener/Work/pocl/lib/CL/pocl_util.c:2347
2347 p[i] = *(uint8_t *)pattern;
(gdb) bt
#0 0x00007fff91d71bab in pocl_fill_aligned_buf_with_pattern (ptr=0x4204061000, offset=0, size=160, pattern=0x55555650d600, pattern_size=1)
at /home/mdiener/Work/pocl/lib/CL/pocl_util.c:2347
#1 0x00007fff91d91088 in pocl_driver_memfill (data=0x555555ef2c80, dst_mem_id=0x7fffffffd390, dst_buf=0x0, size=160, offset=0,
pattern=0x55555650d600, pattern_size=1) at /home/mdiener/Work/pocl/lib/CL/devices/common_driver.c:307
#2 0x00007fff91d91690 in pocl_driver_svm_fill (dev=0x555555f08f00, svm_ptr=0x4204061000, size=160, pattern=0x55555650d600, pattern_size=1)
at /home/mdiener/Work/pocl/lib/CL/devices/common_driver.c:434
#3 0x00007fff8520c96c in pocl_cuda_submit_node (node=0x5555564c2a00, cq=0x5555564ad6e0, locked=1)
at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1469
#4 0x00007fff8520d02c in pocl_cuda_submit (node=0x5555564c2a00, cq=0x5555564ad6e0)
at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1557
#5 0x00007fff91d6cf47 in pocl_command_enqueue (command_queue=0x5555564ad6e0, node=0x5555564c2a00)
at /home/mdiener/Work/pocl/lib/CL/pocl_util.c:1114
#6 0x00007fff91d80469 in POclEnqueueSVMMemFill (command_queue=0x5555564ad6e0, svm_ptr=0x4204061000, pattern=0x7fff923f7b20,
pattern_size=1, size=160, num_events_in_wait_list=0, event_wait_list=0x0, event=0x7fffffffd660)
at /home/mdiener/Work/pocl/lib/CL/clEnqueueSVMMemFill.c:89
#7 0x00007fff922068cc in clEnqueueSVMMemFill () from /shared/home/mdiener/Work/emirge/miniforge3/envs/poclbuild/lib/libOpenCL.so.1
#8 0x00007fff922aa6d2 in pyopencl::enqueue_svm_memfill(pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object) () from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
#9 0x00007fff922a4824 in pybind11::cpp_function::initialize<pyopencl::event* (*&)(pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object), pyopencl::event*, pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v>(pyopencl::event* (*&)(pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object), pyopencl::event* (*)(pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) ()
from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
[...]
It crashes in the memfill operation when accessing ptr
(p
). Note that the arguments are an SVM array (ptr
) and a normal allocation (pattern
). Again, https://github.com/pocl/pocl/pull/1067/commits/03ffc7146f425bee6e6345dfe4208d095ddd7e7b fixes this.
I found another fix (workaround?) in pocl/pocl@03ffc71 which just uses CUDA functions for the memfill operation. With that fix, my simple test and the test in this PR also work.
OK, nice. As far as I can tell, using CUDA to do the SVM fill is the correct thing to do. Otherwise it's neither queue-synchronized nor performed from the device.
I don't believe CU_MEM_ATTACH_HOST
is the correct flag to use; we should not need the memory to be host-visibile. And upon rereading the phrasing of the description of CU_MEM_ATTACH_GLOBAL
, I do not think it means to imply that the memory is host-visible. ("any stream" sort of implies GPU execution, and I suspect they mean "device" in the CUDA sense as well)
@matthiasdiener Please don't force-push to branches on which more than one person is working. Not only is there a risk of clobbering one another's work, it's also very hard to review what's being changed.
With the current version of mirgecom, and setting limit_arg_size_nbytes
to 20, this crashes:
$ python examples/wave.py --lazy
Choose platform:
[0] <pyopencl.Platform 'Portable Computing Language' at 0x7f00190cb008>
Choice [0]:
Choose device(s):
[0] <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz' on 'Portable Computing Language' at 0x55b419f74ff0>
[1] <pyopencl.Device 'Tesla K40c' on 'Portable Computing Language' at 0x55b419f753f0>
Choice, comma-separated [0]:Set the environment variable PYOPENCL_CTX=':' to avoid being asked again.
Traceback (most recent call last):
File "/shared/home/mdiener/Work/svmfuse/mirgecom/examples/wave.py", line 190, in <module>
main(use_profiling=args.profile, use_logmgr=args.logging, lazy=args.lazy)
File "/shared/home/mdiener/Work/svmfuse/mirgecom/examples/wave.py", line 112, in main
nodes = actx.thaw(discr.nodes())
File "/shared/home/mdiener/Work/svmfuse/grudge/grudge/discretization.py", line 729, in nodes
return self.discr_from_dd(dd).nodes()
File "/shared/home/mdiener/Work/svmfuse/meshmode/meshmode/discretization/__init__.py", line 679, in nodes
result = make_obj_array([
File "/shared/home/mdiener/Work/svmfuse/meshmode/meshmode/discretization/__init__.py", line 680, in <listcomp>
_DOFArray(None, tuple([
File "/shared/home/mdiener/Work/svmfuse/meshmode/meshmode/discretization/__init__.py", line 681, in <listcomp>
actx.freeze(resample_mesh_nodes(grp, iaxis)) for grp in self.groups
File "/shared/home/mdiener/Work/svmfuse/arraycontext/arraycontext/impl/pytato/__init__.py", line 426, in freeze
evt, out_dict = pt_prg(self.queue, **bound_arguments)
File "/shared/home/mdiener/Work/svmfuse/pytato/pytato/target/loopy/__init__.py", line 212, in __call__
return self.program(queue,
File "/shared/home/mdiener/Work/svmfuse/loopy/loopy/translation_unit.py", line 347, in __call__
return pex(*args, **kwargs)
File "/shared/home/mdiener/Work/svmfuse/loopy/loopy/target/pyopencl_execution.py", line 387, in __call__
return translation_unit_info.invoker(
File "/shared/home/mdiener/Work/svmfuse/miniforge3/envs/ceesd/lib/python3.9/site-packages/pytools/py_codegen.py", line 150, in __call__
return self.func(*args, **kwargs)
File "<generated code for 'invoke_frozen_nodes0_2d_loopy_kernel'>", line 146, in invoke_frozen_nodes0_2d_loopy_kernel
File "<generated code for 'invoke_frozen_nodes0_2d_loopy_kernel'>", line 27, in _lpy_host_frozen_nodes0_2d
AttributeError: 'pyopencl._cl.Buffer' object has no attribute 'svm_ptr'
Edit: This was resolved by rebasing Kaushik's branch of arraycontext.