loopy icon indicating copy to clipboard operation
loopy copied to clipboard

PyOpenCL target: Overflow large argument counts into SVM struct

Open matthiasdiener opened this issue 2 years ago • 9 comments

Needs:

  • [ ] https://github.com/inducer/pyopencl/pull/452

matthiasdiener avatar Jul 01 '22 00:07 matthiasdiener

With

  • https://github.com/pocl/pocl/pull/1069
  • https://github.com/inducer/pyopencl/pull/452

the following passes for me:

LOOPY_NO_CACHE=1 pycl test_target.py 'test_passing_bajillions_of_svm_args(cl._csc)'    

Let me know if you can reproduce that.

inducer avatar Jul 02 '22 05:07 inducer

I confirmed that this PR works on POCL-pthreads and Nvidia CL when also using https://github.com/inducer/pyopencl/pull/452 (and https://github.com/pocl/pocl/pull/1069 in the POCL-pthreads case).

With POCL-cuda and https://github.com/pocl/pocl/pull/1067 in addition to https://github.com/pocl/pocl/pull/1069 it currently fails when trying to access an array on the host after executing a GPU kernel.

The following code reproduces this:

import numpy as np
import pyopencl as cl
import pyopencl.array as cla

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

alloc = cl.tools.SVMAllocator(ctx, cl.svm_mem_flags.READ_WRITE, queue=queue)
ary = cla.zeros(queue, 20, np.float64, allocator=alloc)
ary.fill(17)  # runs as GPU kernel
ary.copy() # segfaults here

The full output log is (with OCL_ICD_DEBUG=7 and POCL_DEBUG=all):

$ python svm2.py
ocl-icd(ocl_icd_loader.c:776): __initClIcd: Reading icd list from '/home/mdiener/Work/pocl/install/etc/OpenCL/vendors/'
ocl-icd(ocl_icd_loader.c:234): _find_num_icds: return: 1/0x1
ocl-icd(ocl_icd_loader.c:265): _open_driver: Considering file '/home/mdiener/Work/pocl/install/etc/OpenCL/vendors//pocl.icd'
ocl-icd(ocl_icd_loader.c:239): _load_icd: Loading ICD '/home/mdiener/Work/pocl/install/lib/libpocl.so.2.9.0'
ocl-icd(ocl_icd_loader.c:243): _load_icd: ICD[0] loaded
ocl-icd(ocl_icd_loader.c:297): _open_driver: return: 1/0x1
ocl-icd(ocl_icd_loader.c:320): _open_drivers: return: 1/0x1
ocl-icd(ocl_icd_loader.c:477): _find_and_check_platforms: Checking ICD 0/1
ocl-icd(ocl_icd_loader.c:325): _get_function_addr: Looking for function clGetExtensionFunctionAddress
ocl-icd(ocl_icd_loader.c:343): _get_function_addr: return: 140254412730417/0x7f8f8672f031
ocl-icd(ocl_icd_loader.c:325): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR
ocl-icd(ocl_icd_loader.c:328): _get_function_addr: Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:343): _get_function_addr: return: 140254412732259/0x7f8f8672f763
ocl-icd(ocl_icd_loader.c:325): _get_function_addr: Looking for function clGetPlatformInfo
ocl-icd(ocl_icd_loader.c:328): _get_function_addr: Missing global symbol 'clGetPlatformInfo' in ICD, should be skipped
ocl-icd(ocl_icd_loader.c:343): _get_function_addr: return: 140254412570414/0x7f8f86707f2e
ocl-icd(ocl_icd_loader.c:526): _find_and_check_platforms: Try to load 1 platforms
ocl-icd(ocl_icd_loader.c:348): _allocate_platforms: Requesting allocation for 1 platforms
ocl-icd(ocl_icd_loader.c:358): _allocate_platforms: return: 1/0x1
ocl-icd(ocl_icd_loader.c:533): _find_and_check_platforms: Checking platform 0
ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: cl_khr_icd cl_pocl_content_size
ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: POCL
ocl-icd(ocl_icd_loader.c:603): _find_and_check_platforms: Extension suffix: POCL
ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: FULL_PROFILE
ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: OpenCL 3.0 PoCL 3.1-pre cuda-svm-0-g8e6dd829  Linux, Debug+Asserts, RELOC, SPIR, LLVM 14.0.6, SLEEF, CUDA, POCL_DEBUG
ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: Portable Computing Language
ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: The pocl project
ocl-icd(ocl_icd_loader.c:431): _sort_platforms: Nb platefroms: 1
ocl-icd(ocl_icd_loader.c:824): __initClIcd: 1 valid vendor(s)!
ocl-icd(ocl_icd_loader.c:1060): clGetPlatformIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
** Final POCL_DEBUG flags: FFFFFFFFFFFFFFFF
[2022-07-06 04:18:58.406713745]POCL: in fn pocl_install_sigfpe_handler at line 229:
  |   GENERAL |  Installing SIGFPE handler...
[2022-07-06 04:18:58.522237845]POCL: in fn pocl_cuda_init at line 397:
  |   GENERAL |  [CUDA] GPU architecture = sm_35
[2022-07-06 04:18:58.522300352]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/nvvm/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522331286]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/nvvm/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522343606]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/nvidia-cuda-toolkit/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522360890]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522373642]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522387107]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522400285]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/local/lib/cuda/nvvm/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522415089]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/local/lib/cuda/nvvm/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522427965]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/local/lib/cuda/nvidia-cuda-toolkit/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522440770]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/local/lib/cuda/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522453608]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/local/lib/cuda/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522466226]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/local/lib/cuda/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522479006]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/local/lib/nvvm/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522491669]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/local/lib/nvvm/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522503989]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/local/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522517051]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/local/lib/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522529598]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/local/lib/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522542592]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/local/lib/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522555310]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/lib/nvvm/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522568743]POCL: in fn findLibDevice at line 581:
  |      CUDA | looking for libdevice at '/usr/lib/nvvm/libdevice/libdevice.compute_35.10.bc'
[2022-07-06 04:18:58.522580592]POCL: in fn findLibDevice at line 569:
  |      CUDA | looking for libdevice at '/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc'
[2022-07-06 04:18:58.522596110]POCL: in fn findLibDevice at line 572:
  |      CUDA | found libdevice at '/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc'
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering
ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0
ocl-icd(ocl_icd_loader.c:1140): clCreateContext: Entering
[2022-07-06 04:18:58.630074880]POCL: in fn POclCreateCommandQueue at line 47:
  |   GENERAL |  Create Command queue on device 1
[2022-07-06 04:18:58.630114638]POCL: in fn void pocl_llvm_create_context(cl_context) at line 379:
  |      LLVM |  creating LLVM context
ocl-icd(ocl_icd_loader.c:1149): clCreateContext: return: 94104096093728/0x559651383620
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:3259): clCreateCommandQueueWithProperties: Entering
[2022-07-06 04:18:58.630255155]POCL: in fn POclCreateCommandQueue at line 47:
  |   GENERAL |  Create Command queue on device 1
[2022-07-06 04:18:58.630266146]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 0x559651383620  : 2
ocl-icd(ocl_icd_loader_gen.c:3268): clCreateCommandQueueWithProperties: return: 94104096102768/0x559651385970
ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering
[2022-07-06 04:18:58.630350961]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 0x559651383620  : 3
ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1765): clRetainCommandQueue: Entering
[2022-07-06 04:18:58.630446057]POCL: in fn POclRetainCommandQueue at line 33:
  | REFCOUNTS |  Retain Command Queue 0x559651385970  : 2
ocl-icd(ocl_icd_loader_gen.c:1771): clRetainCommandQueue: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:3310): clSVMAlloc: Entering
SVM cuMemAllocManaged 160
before write 0 0x4204060000
after write 42.000000
[2022-07-06 04:18:58.630824524]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 0x559651383620  : 4
[2022-07-06 04:18:58.630836034]POCL: in fn POclSVMAlloc at line 114:
  |    MEMORY |  Allocated SVM: PTR 0x4204060000, SIZE 160, FLAGS 1
ocl-icd(ocl_icd_loader_gen.c:3316): clSVMAlloc: return: 283535343616/0x4204060000
ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:3379): clEnqueueSVMMemFill: Entering
[2022-07-06 04:18:58.631313572]POCL: in fn POclRetainCommandQueue at line 33:
  | REFCOUNTS |  Retain Command Queue 0x559651385970  : 3
[2022-07-06 04:18:58.631323842]POCL: in fn pocl_create_event at line 514:
  |    EVENTS |  Created event 0x55965138ddb0 / ID 1 / Command svm_memfill
[2022-07-06 04:18:58.631333776]POCL: in fn pocl_create_command_struct at line 630:
  |    EVENTS |  event pointer provided
[2022-07-06 04:18:58.631342637]POCL: in fn pocl_create_command_struct at line 650:
  |    EVENTS |  Created command struct: CMD 0x55965138dcd0 (event 1 / 0x55965138ddb0, type: svm_memfill)
[2022-07-06 04:18:58.631353096]POCL: in fn pocl_command_enqueue at line 1061:
  |    EVENTS |  In-order Q; adding event syncs
[2022-07-06 04:18:58.631362451]POCL: in fn pocl_command_enqueue at line 1105:
  |    EVENTS |  Pushed Event 1 to CQ 5.
[2022-07-06 04:18:58.631372020]POCL: in fn pocl_update_event_queued at line 1922:
  |    EVENTS |  Event queued: 1
[2022-07-06 04:18:58.631394026]POCL: in fn pocl_update_event_submitted at line 1942:
  |    EVENTS |  Event submitted: 1
BEFORE MEMFILL 160 0x4204060000
AFTER MEMFILL
ocl-icd(ocl_icd_loader_gen.c:3385): clEnqueueSVMMemFill: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2005): clCreateProgramWithSource: Entering
[2022-07-06 04:18:58.632158085]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 0x559651383620  : 5
ocl-icd(ocl_icd_loader_gen.c:2014): clCreateProgramWithSource: return: 94104096146192/0x559651390310
ocl-icd(ocl_icd_loader_gen.c:2066): clBuildProgram: Entering
[2022-07-06 04:18:58.632208081]POCL: in fn compile_and_link_program at line 691:
  |      LLVM |  building program with options -I /shared/home/mdiener/Work/pyopencl/pyopencl/cl
[2022-07-06 04:18:58.632218813]POCL: in fn compile_and_link_program at line 713:
  |      LLVM |  building program for 0 devs with options -I /shared/home/mdiener/Work/pyopencl/pyopencl/cl
[2022-07-06 04:18:58.632228799]POCL: in fn compile_and_link_program at line 717:
  |      LLVM |     BUILDING for device: Tesla K40c
[2022-07-06 04:18:58.632238807]POCL: in fn pocl_driver_build_source at line 511:
  |      LLVM |  building from sources for device 0
[2022-07-06 04:18:58.632438048]POCL: in fn int pocl_llvm_build_program(cl_program, unsigned int, cl_uint, _cl_program* const*, const char**, int) at line 382:
  |      LLVM |  all build options: -mllvm --nvptx-short-ptr -Dcl_khr_int64 -DPOCL_DEVICE_ADDRESS_BITS=64 -D__USE_CLANG_OPENCL_C_H -xcl -Dinline= -I. -cl-kernel-arg-info -I /shared/home/mdiener/Work/pyopencl/pyopencl/cl -D__ENDIAN_LITTLE__=1 -DCL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE=0 -D__OPENCL_VERSION__=120 -cl-std=CL1.2 -D__OPENCL_C_VERSION__=120 -Dcl_khr_byte_addressable_store=1 -Dcl_khr_global_int32_base_atomics=1 -Dcl_khr_global_int32_extended_atomics=1 -Dcl_khr_local_int32_base_atomics=1 -Dcl_khr_local_int32_extended_atomics=1 -Dcl_khr_fp64=1 -Dcl_khr_int64_base_atomics=1 -Dcl_khr_int64_extended_atomics=1 -Dcl_nv_device_attribute_query=1 -Dcl_khr_spir=1 -cl-ext=-all,+cl_khr_byte_addressable_store,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_fp64,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_nv_device_attribute_query,+cl_khr_spir -fno-builtin -triple=nvptx64 -target-cpu sm_35
ocl-icd(ocl_icd_loader_gen.c:2072): clBuildProgram: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2081): clGetProgramInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2087): clGetProgramInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2081): clGetProgramInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2087): clGetProgramInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2097): clGetProgramBuildInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2103): clGetProgramBuildInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2097): clGetProgramBuildInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2103): clGetProgramBuildInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2110): clCreateKernel: Entering
ocl-icd(ocl_icd_loader_gen.c:2119): clCreateKernel: return: 94104096155376/0x5596513926f0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering
[2022-07-06 04:18:58.697513230]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 0x559651383620  : 6
ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering
[2022-07-06 04:18:58.697573667]POCL: in fn POclReleaseContext at line 47:
  | REFCOUNTS |  Release Context
ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering
[2022-07-06 04:18:58.698408613]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 0x559651383620  : 6
ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering
[2022-07-06 04:18:58.698444839]POCL: in fn POclReleaseContext at line 47:
  | REFCOUNTS |  Release Context
ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering
[2022-07-06 04:18:58.698467371]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 0x559651383620  : 6
ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering
[2022-07-06 04:18:58.698561830]POCL: in fn POclReleaseContext at line 47:
  | REFCOUNTS |  Release Context
ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering
[2022-07-06 04:18:58.698612903]POCL: in fn POclRetainContext at line 32:
  | REFCOUNTS |  Retain Context 0x559651383620  : 6
ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering
[2022-07-06 04:18:58.698642850]POCL: in fn POclReleaseContext at line 47:
  | REFCOUNTS |  Release Context
ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2050): clReleaseProgram: Entering
[2022-07-06 04:18:58.699081032]POCL: in fn POclReleaseProgram at line 50:
  | REFCOUNTS |  Release program 0x559651390310, new refcount: 1, kernel #: 1
ocl-icd(ocl_icd_loader_gen.c:2056): clReleaseProgram: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2194): clGetKernelWorkGroupInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:2200): clGetKernelWorkGroupInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:3441): clSetKernelArgSVMPointer: Entering
[2022-07-06 04:18:58.699270966]POCL: in fn POclSetKernelArgSVMPointer at line 43:
  |   GENERAL |  Setting kernel ARG 0 to SVM 0x4204060000
ocl-icd(ocl_icd_loader_gen.c:3447): clSetKernelArgSVMPointer: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2163): clSetKernelArg: Entering
[2022-07-06 04:18:58.699293930]POCL: in fn POclSetKernelArg at line 107:
  |   GENERAL |  Kernel            fill || SetArg idx   1 ||     long || Local 0 || Size      8 || Value 0x7ffc31e941a8 || Pointer (nil) || *(uint32*)Value:        0 || *(uint64*)Value:        0 ||
Hex Value:  00000000 00000000
ocl-icd(ocl_icd_loader_gen.c:2169): clSetKernelArg: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2163): clSetKernelArg: Entering
[2022-07-06 04:18:58.699312532]POCL: in fn POclSetKernelArg at line 107:
  |   GENERAL |  Kernel            fill || SetArg idx   2 ||   double || Local 0 || Size      8 || Value 0x7ffc31e941a8 || Pointer (nil) || *(uint32*)Value:        0 || *(uint64*)Value: 4625478292286210048 ||
Hex Value:  00000000 00003140
ocl-icd(ocl_icd_loader_gen.c:2169): clSetKernelArg: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2163): clSetKernelArg: Entering
[2022-07-06 04:18:58.699330226]POCL: in fn POclSetKernelArg at line 107:
  |   GENERAL |  Kernel            fill || SetArg idx   3 ||     long || Local 0 || Size      8 || Value 0x7ffc31e941a8 || Pointer (nil) || *(uint32*)Value:        0 || *(uint64*)Value:       20 ||
Hex Value:  14000000 00000000
ocl-icd(ocl_icd_loader_gen.c:2169): clSetKernelArg: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:2507): clEnqueueNDRangeKernel: Entering
[2022-07-06 04:18:58.699356022]POCL: in fn POclEnqueueNDRangeKernel at line 221:
  |   GENERAL |  Queueing kernel fill with local size 32 x 1 x 1 group sizes 1 x 1 x 1...
[2022-07-06 04:18:58.699367414]POCL: in fn POclRetainCommandQueue at line 33:
  | REFCOUNTS |  Retain Command Queue 0x559651385970  : 4
[2022-07-06 04:18:58.699376473]POCL: in fn pocl_create_event at line 514:
  |    EVENTS |  Created event 0x55965144f190 / ID 2 / Command ndrange_kernel
[2022-07-06 04:18:58.699384689]POCL: in fn pocl_create_command_struct at line 630:
  |    EVENTS |  event pointer provided
[2022-07-06 04:18:58.699394485]POCL: in fn pocl_create_event_sync at line 530:
  |    EVENTS |  create event sync: waiting 2 , notifier 1
[2022-07-06 04:18:58.699403512]POCL: in fn pocl_create_command_struct at line 650:
  |    EVENTS |  Created command struct: CMD 0x559650ddf460 (event 2 / 0x55965144f190, type: ndrange_kernel)
[2022-07-06 04:18:58.699413923]POCL: in fn POclRetainKernel at line 33:
  | REFCOUNTS |  Retain Kernel 0x5596513926f0  : 2
[2022-07-06 04:18:58.699422964]POCL: in fn pocl_command_enqueue at line 1061:
  |    EVENTS |  In-order Q; adding event syncs
[2022-07-06 04:18:58.699429314]POCL: in fn pocl_create_event_sync at line 530:
  |    EVENTS |  create event sync: waiting 2 , notifier 1
[2022-07-06 04:18:58.699438005]POCL: in fn pocl_create_event_sync at line 543:
  |    EVENTS |  Skipping event sync creation
[2022-07-06 04:18:58.699446350]POCL: in fn pocl_create_event_sync at line 530:
  |    EVENTS |  create event sync: waiting 2 , notifier 1
[2022-07-06 04:18:58.699454328]POCL: in fn pocl_create_event_sync at line 543:
  |    EVENTS |  Skipping event sync creation
[2022-07-06 04:18:58.699462708]POCL: in fn pocl_command_enqueue at line 1105:
  |    EVENTS |  Pushed Event 2 to CQ 5.
[2022-07-06 04:18:58.699470964]POCL: in fn pocl_update_event_queued at line 1922:
  |    EVENTS |  Event queued: 2
[2022-07-06 04:18:58.699489571]POCL: in fn pocl_update_event_submitted at line 1942:
  |    EVENTS |  Event submitted: 2
ocl-icd(ocl_icd_loader_gen.c:2513): clEnqueueNDRangeKernel: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1765): clRetainCommandQueue: Entering
[2022-07-06 04:18:58.700081115]POCL: in fn POclRetainCommandQueue at line 33:
  | REFCOUNTS |  Retain Command Queue 0x559651385970  : 5
ocl-icd(ocl_icd_loader_gen.c:1771): clRetainCommandQueue: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering
ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0
ocl-icd(ocl_icd_loader_gen.c:3310): clSVMAlloc: Entering
SVM cuMemAllocManaged 160
before write 0 0x4204061000
Bus error (core dumped)

i.e., it seems to successfully allocate memory with clSVMAlloc/cuMemAllocManaged, but the resulting memory appears to be inaccessible from the host.

matthiasdiener avatar Jul 06 '22 04:07 matthiasdiener

I obviously can't guarantee that that's what at issue here, but I suspect you'll need https://github.com/pocl/pocl/pull/1069 (or another fix for the same issue) in order to allow this to work. I'm actually sort of surprised pocl-pthreads worked.

If that doesn't help, a backtrace and potentially the first meaningful thing flagged by valgrind would be of use.

inducer avatar Jul 06 '22 07:07 inducer

I obviously can't guarantee that that's what at issue here, but I suspect you'll need pocl/pocl#1069 (or another fix for the same issue) in order to allow this to work. I'm actually sort of surprised pocl-pthreads worked.

I'm sorry, I should have been clearer. I used https://github.com/inducer/pyopencl/pull/452 and https://github.com/pocl/pocl/pull/1069 (for POCL-pthreads and POCL-cuda) for all tests. Note that my failing example above doesn't even use loopy (the loopy test in this PR just exposed it), so this PR might be the wrong location to track this issue.

If that doesn't help, a backtrace and potentially the first meaningful thing flagged by valgrind would be of use.

Here is a part of the backtrace at the point of the crash. It doesn't appear to be too useful though.

SVM cuMemAllocManaged 160
before write 0 0x4204061000
--Type <RET> for more, q to quit, c to continue without paging--q

Thread 1 "python" received signal SIGBUS, Bus error.
0x00007fff8510e39c in pocl_cuda_svm_alloc (dev=0x555555f27020, flags=1, size=160)
    at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1895
1895	  ((double*)dptr)[0] = 42.0;
(gdb) bt
#0  0x00007fff8510e39c in pocl_cuda_svm_alloc (dev=0x555555f27020, flags=1, size=160)
    at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1895
#1  0x00007fff91c7cfb6 in POclSVMAlloc (context=0x5555564b86d0, flags=1, size=160, alignment=128)
    at /home/mdiener/Work/pocl/lib/CL/clSVMAlloc.c:98
#2  0x00007fff9220638f in clSVMAlloc () from /shared/home/mdiener/Work/emirge/miniforge3/envs/poclbuild/lib/libOpenCL.so.1
#3  0x00007fff922bb337 in pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*>::execute<pybind11::class_<pyopencl::svm_allocation>, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v, 0>(pybind11::class_<pyopencl::svm_allocation>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*)#1}, void, pybind11::detail::value_and_holder&, std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v>(pybind11::class_<pyopencl::svm_allocation>&&, void (*)(pybind11::detail::value_and_holder&, std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) ()
   from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
#4  0x00007fff922494a6 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) ()
   from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
#5  0x000055555569850c in cfunction_call (func=0x7fff9240e1b0, args=<optimized out>, kwargs=<optimized out>)
    at /usr/local/src/conda/python-3.10.5/Objects/methodobject.c:543
#6  0x00005555556a6db9 in _PyObject_Call (kwargs=<optimized out>, args=0x7fff91e2cdc0, callable=0x7fff9240e1b0, tstate=0x55555591be10)
    at /usr/local/src/conda/python-3.10.5/Objects/call.c:305
[...]

I modified https://github.com/pocl/pocl/pull/1067 such that it tries to write to the just allocated buffer after a successful cuMemAllocManaged (this just triggers the issue a bit earlier). What happens is that cuMemAllocManaged appears to succeed, but the returned memory can not be read or written on the host. This only appears after a kernel has run on the GPU. As far as I can see, no SVM allocated memory is freed at all for the example code I used.

This is on koelsch with a Tesla K40c.

matthiasdiener avatar Jul 06 '22 14:07 matthiasdiener

With CU_MEM_ATTACH_GLOBAL, I don't think you have a guarantee that the memory should be accessible from the host. Also, since you seem to attribute the crash in the sample code from https://github.com/inducer/loopy/pull/642#issuecomment-1175764168 to host-side access, could you explain where you think that host-side access is happening? (I don't see it. A backtrace would help.)

inducer avatar Jul 07 '22 11:07 inducer

Btw, I agree that this discussion does not have much to do with Loopy. Maybe let's continue the discussion here: https://github.com/inducer/pyopencl/pull/452.

inducer avatar Jul 07 '22 11:07 inducer

(Continuing the discussion here for a bit since I got the loopy test in this PR running with the change outlined below)

With CU_MEM_ATTACH_GLOBAL, I don't think you have a guarantee that the memory should be accessible from the host.

Hmm, that is interesting. Based on your comment, I tried CU_MEM_ATTACH_HOST, and with that change, my test case above and the test in this PR run successfully on POCL-cuda. I'm not sure I understand why though, based on the documentation:

flags specifies the default stream association for this allocation. flags must be one of CU_MEM_ATTACH_GLOBAL or CU_MEM_ATTACH_HOST. If CU_MEM_ATTACH_GLOBAL is specified, then this memory is accessible from any stream on any device. If CU_MEM_ATTACH_HOST is specified, then the allocation should not be accessed from devices that have a zero value for the device attribute CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS; an explicit call to cuStreamAttachMemAsync will be required to enable access on such devices.

The device I'm running on (K40c) returns 0 for CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS (but 1 for CU_DEVICE_ATTRIBUTE_MANAGED_MEMORY).

I found another fix (workaround?) in https://github.com/pocl/pocl/pull/1067/commits/03ffc7146f425bee6e6345dfe4208d095ddd7e7b which just uses CUDA functions for the memfill operation. With that fix, my simple test and the test in this PR also work.

Also, since you seem to attribute the crash in the sample code from #642 (comment) to host-side access, could you explain where you think that host-side access is happening? (I don't see it. A backtrace would help.)

Here is a backtrace with my debug cruft removed:

--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "python" received signal SIGBUS, Bus error.
0x00007fff91d71bab in pocl_fill_aligned_buf_with_pattern (ptr=0x4204061000, offset=0, size=160, pattern=0x55555650d600, pattern_size=1)
    at /home/mdiener/Work/pocl/lib/CL/pocl_util.c:2347
2347	          p[i] = *(uint8_t *)pattern;
(gdb) bt
#0  0x00007fff91d71bab in pocl_fill_aligned_buf_with_pattern (ptr=0x4204061000, offset=0, size=160, pattern=0x55555650d600, pattern_size=1)
    at /home/mdiener/Work/pocl/lib/CL/pocl_util.c:2347
#1  0x00007fff91d91088 in pocl_driver_memfill (data=0x555555ef2c80, dst_mem_id=0x7fffffffd390, dst_buf=0x0, size=160, offset=0,
    pattern=0x55555650d600, pattern_size=1) at /home/mdiener/Work/pocl/lib/CL/devices/common_driver.c:307
#2  0x00007fff91d91690 in pocl_driver_svm_fill (dev=0x555555f08f00, svm_ptr=0x4204061000, size=160, pattern=0x55555650d600, pattern_size=1)
    at /home/mdiener/Work/pocl/lib/CL/devices/common_driver.c:434
#3  0x00007fff8520c96c in pocl_cuda_submit_node (node=0x5555564c2a00, cq=0x5555564ad6e0, locked=1)
    at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1469
#4  0x00007fff8520d02c in pocl_cuda_submit (node=0x5555564c2a00, cq=0x5555564ad6e0)
    at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1557
#5  0x00007fff91d6cf47 in pocl_command_enqueue (command_queue=0x5555564ad6e0, node=0x5555564c2a00)
    at /home/mdiener/Work/pocl/lib/CL/pocl_util.c:1114
#6  0x00007fff91d80469 in POclEnqueueSVMMemFill (command_queue=0x5555564ad6e0, svm_ptr=0x4204061000, pattern=0x7fff923f7b20,
    pattern_size=1, size=160, num_events_in_wait_list=0, event_wait_list=0x0, event=0x7fffffffd660)
    at /home/mdiener/Work/pocl/lib/CL/clEnqueueSVMMemFill.c:89
#7  0x00007fff922068cc in clEnqueueSVMMemFill () from /shared/home/mdiener/Work/emirge/miniforge3/envs/poclbuild/lib/libOpenCL.so.1
#8  0x00007fff922aa6d2 in pyopencl::enqueue_svm_memfill(pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object) () from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
#9  0x00007fff922a4824 in pybind11::cpp_function::initialize<pyopencl::event* (*&)(pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object), pyopencl::event*, pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v, pybind11::arg_v>(pyopencl::event* (*&)(pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object), pyopencl::event* (*)(pyopencl::command_queue&, pyopencl::svm_arg_wrapper&, pybind11::object, pybind11::object, pybind11::object), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) ()
   from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
[...]

It crashes in the memfill operation when accessing ptr (p). Note that the arguments are an SVM array (ptr) and a normal allocation (pattern). Again, https://github.com/pocl/pocl/pull/1067/commits/03ffc7146f425bee6e6345dfe4208d095ddd7e7b fixes this.

matthiasdiener avatar Jul 07 '22 18:07 matthiasdiener

I found another fix (workaround?) in pocl/pocl@03ffc71 which just uses CUDA functions for the memfill operation. With that fix, my simple test and the test in this PR also work.

OK, nice. As far as I can tell, using CUDA to do the SVM fill is the correct thing to do. Otherwise it's neither queue-synchronized nor performed from the device.

I don't believe CU_MEM_ATTACH_HOST is the correct flag to use; we should not need the memory to be host-visibile. And upon rereading the phrasing of the description of CU_MEM_ATTACH_GLOBAL, I do not think it means to imply that the memory is host-visible. ("any stream" sort of implies GPU execution, and I suspect they mean "device" in the CUDA sense as well)

inducer avatar Jul 10 '22 21:07 inducer

@matthiasdiener Please don't force-push to branches on which more than one person is working. Not only is there a risk of clobbering one another's work, it's also very hard to review what's being changed.

inducer avatar Jul 13 '22 22:07 inducer

With the current version of mirgecom, and setting limit_arg_size_nbytes to 20, this crashes:

$ python examples/wave.py --lazy
Choose platform:
[0] <pyopencl.Platform 'Portable Computing Language' at 0x7f00190cb008>
Choice [0]:

Choose device(s):
[0] <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz' on 'Portable Computing Language' at 0x55b419f74ff0>
[1] <pyopencl.Device 'Tesla K40c' on 'Portable Computing Language' at 0x55b419f753f0>
Choice, comma-separated [0]:Set the environment variable PYOPENCL_CTX=':' to avoid being asked again.
Traceback (most recent call last):
  File "/shared/home/mdiener/Work/svmfuse/mirgecom/examples/wave.py", line 190, in <module>
    main(use_profiling=args.profile, use_logmgr=args.logging, lazy=args.lazy)
  File "/shared/home/mdiener/Work/svmfuse/mirgecom/examples/wave.py", line 112, in main
    nodes = actx.thaw(discr.nodes())
  File "/shared/home/mdiener/Work/svmfuse/grudge/grudge/discretization.py", line 729, in nodes
    return self.discr_from_dd(dd).nodes()
  File "/shared/home/mdiener/Work/svmfuse/meshmode/meshmode/discretization/__init__.py", line 679, in nodes
    result = make_obj_array([
  File "/shared/home/mdiener/Work/svmfuse/meshmode/meshmode/discretization/__init__.py", line 680, in <listcomp>
    _DOFArray(None, tuple([
  File "/shared/home/mdiener/Work/svmfuse/meshmode/meshmode/discretization/__init__.py", line 681, in <listcomp>
    actx.freeze(resample_mesh_nodes(grp, iaxis)) for grp in self.groups
  File "/shared/home/mdiener/Work/svmfuse/arraycontext/arraycontext/impl/pytato/__init__.py", line 426, in freeze
    evt, out_dict = pt_prg(self.queue, **bound_arguments)
  File "/shared/home/mdiener/Work/svmfuse/pytato/pytato/target/loopy/__init__.py", line 212, in __call__
    return self.program(queue,
  File "/shared/home/mdiener/Work/svmfuse/loopy/loopy/translation_unit.py", line 347, in __call__
    return pex(*args, **kwargs)
  File "/shared/home/mdiener/Work/svmfuse/loopy/loopy/target/pyopencl_execution.py", line 387, in __call__
    return translation_unit_info.invoker(
  File "/shared/home/mdiener/Work/svmfuse/miniforge3/envs/ceesd/lib/python3.9/site-packages/pytools/py_codegen.py", line 150, in __call__
    return self.func(*args, **kwargs)
  File "<generated code for 'invoke_frozen_nodes0_2d_loopy_kernel'>", line 146, in invoke_frozen_nodes0_2d_loopy_kernel
  File "<generated code for 'invoke_frozen_nodes0_2d_loopy_kernel'>", line 27, in _lpy_host_frozen_nodes0_2d
AttributeError: 'pyopencl._cl.Buffer' object has no attribute 'svm_ptr'

Edit: This was resolved by rebasing Kaushik's branch of arraycontext.

matthiasdiener avatar Aug 16 '22 05:08 matthiasdiener