futhark-pycffi icon indicating copy to clipboard operation
futhark-pycffi copied to clipboard

segfault using opencl

Open rowanG077 opened this issue 9 months ago • 5 comments

OS: NixOS unstable Machine: M2 Max Opencl driver: asahi graphics driver + rusticl

Consider this futhark module saved as segfault.fut:

-- ==
-- random input { [6400][3200]u8 }
entry main [n] [m]
  (luminance: [n][m]u8): [m][n]u8
  = map (\i -> map (\j -> luminance[n-1-j, i]) (iota n)) (iota m)

Which works without issues when benching with futhark bench --backend=opencl segfault.fut. However when I create a python interface from it and run it with random inputs it often segfaults. The multicore backend at least has no problems.

futhark opencl --library -o segfault segfault.fut
build_futhark_ffi segfault

and segfault.py:

from futhark_ffi import Futhark
import numpy as np

import _segfault
_ffi = Futhark(_segfault)

if __name__ == "__main__":
    luminance = np.random.randint(0, 256, (6400, 3200))
    pc = _ffi.from_futhark(_ffi.main(luminance))
    print(pc.shape)

And running it results in a segfault.

python segfault.py 
Segmentation fault (core dumped)

I expect this is a futhark pycffi issue because the segfault does not present itself using the bench utility. Valgrind indicates it's somewhere in the rusticl opencl code.

valgrind python segfault.py 
==118477== Memcheck, a memory error detector
==118477== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==118477== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==118477== Command: python segfault.py
==118477== 
==118477== Thread 3 rusticl queue t:
==118477== Invalid read of size 8
==118477==    at 0x48925C8: __GI_memcpy (in /nix/store/jp9c2qj2dmii4c1sqrpmr2qp592nvsli-valgrind-3.24.0/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==118477==    by 0x219BEA5B: u_default_buffer_subdata (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x21750E7F: mesa_rust::pipe::context::PipeContext::buffer_subdata (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216CECFB: rusticl::core::memory::Buffer::write (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216B48F7: rusticl::api::memory::enqueue_write_buffer::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x217328CF: core::ops::function::FnOnce::call_once{{vtable.shim}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216F3DB7: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216FDBB7: rusticl::core::event::Event::call::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216ECB43: core::option::Option<T>::map_or (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216FD91F: rusticl::core::event::Event::call (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x21704B23: rusticl::core::queue::Queue::new::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216A15AF: std::sys::backtrace::__rust_begin_short_backtrace (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==  Address 0x400b8040 is not stack'd, malloc'd or (recently) free'd
==118477== 
==118477== 
==118477== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==118477==  Access not within mapped region at address 0x400B8040
==118477==    at 0x48925C8: __GI_memcpy (in /nix/store/jp9c2qj2dmii4c1sqrpmr2qp592nvsli-valgrind-3.24.0/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==118477==    by 0x219BEA5B: u_default_buffer_subdata (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x21750E7F: mesa_rust::pipe::context::PipeContext::buffer_subdata (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216CECFB: rusticl::core::memory::Buffer::write (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216B48F7: rusticl::api::memory::enqueue_write_buffer::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x217328CF: core::ops::function::FnOnce::call_once{{vtable.shim}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216F3DB7: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216FDBB7: rusticl::core::event::Event::call::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216ECB43: core::option::Option<T>::map_or (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216FD91F: rusticl::core::event::Event::call (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x21704B23: rusticl::core::queue::Queue::new::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216A15AF: std::sys::backtrace::__rust_begin_short_backtrace (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==  If you believe this happened as a result of a stack
==118477==  overflow in your program's main thread (unlikely but
==118477==  possible), you can try to increase the size of the
==118477==  main thread stack using the --main-stacksize= flag.
==118477==  The main thread stack size used in this run was 8388608.
==118477== 
==118477== HEAP SUMMARY:
==118477==     in use at exit: 200,911,965 bytes in 142,087 blocks
==118477==   total heap usage: 425,014 allocs, 282,927 frees, 322,863,975 bytes allocated
==118477== 
==118477== LEAK SUMMARY:
==118477==    definitely lost: 170,726 bytes in 43 blocks
==118477==    indirectly lost: 0 bytes in 0 blocks
==118477==      possibly lost: 30,644,137 bytes in 113,615 blocks
==118477==    still reachable: 170,097,102 bytes in 28,429 blocks
==118477==         suppressed: 0 bytes in 0 blocks
==118477== Rerun with --leak-check=full to see details of leaked memory
==118477== 
==118477== For lists of detected and suppressed errors, rerun with: -s
==118477== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

rowanG077 avatar Feb 07 '25 14:02 rowanG077