futhark-pycffi icon indicating copy to clipboard operation
futhark-pycffi copied to clipboard

segfault using opencl

Open rowanG077 opened this issue 10 months ago • 5 comments

OS: NixOS unstable Machine: M2 Max Opencl driver: asahi graphics driver + rusticl

Consider this futhark module saved as segfault.fut:

-- ==
-- random input { [6400][3200]u8 }
entry main [n] [m]
  (luminance: [n][m]u8): [m][n]u8
  = map (\i -> map (\j -> luminance[n-1-j, i]) (iota n)) (iota m)

Which works without issues when benching with futhark bench --backend=opencl segfault.fut. However when I create a python interface from it and run it with random inputs it often segfaults. The multicore backend at least has no problems.

futhark opencl --library -o segfault segfault.fut
build_futhark_ffi segfault

and segfault.py:

from futhark_ffi import Futhark
import numpy as np

import _segfault
_ffi = Futhark(_segfault)

if __name__ == "__main__":
    luminance = np.random.randint(0, 256, (6400, 3200))
    pc = _ffi.from_futhark(_ffi.main(luminance))
    print(pc.shape)

And running it results in a segfault.

python segfault.py 
Segmentation fault (core dumped)

I expect this is a futhark pycffi issue because the segfault does not present itself using the bench utility. Valgrind indicates it's somewhere in the rusticl opencl code.

valgrind python segfault.py 
==118477== Memcheck, a memory error detector
==118477== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==118477== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==118477== Command: python segfault.py
==118477== 
==118477== Thread 3 rusticl queue t:
==118477== Invalid read of size 8
==118477==    at 0x48925C8: __GI_memcpy (in /nix/store/jp9c2qj2dmii4c1sqrpmr2qp592nvsli-valgrind-3.24.0/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==118477==    by 0x219BEA5B: u_default_buffer_subdata (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x21750E7F: mesa_rust::pipe::context::PipeContext::buffer_subdata (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216CECFB: rusticl::core::memory::Buffer::write (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216B48F7: rusticl::api::memory::enqueue_write_buffer::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x217328CF: core::ops::function::FnOnce::call_once{{vtable.shim}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216F3DB7: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216FDBB7: rusticl::core::event::Event::call::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216ECB43: core::option::Option<T>::map_or (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216FD91F: rusticl::core::event::Event::call (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x21704B23: rusticl::core::queue::Queue::new::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216A15AF: std::sys::backtrace::__rust_begin_short_backtrace (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==  Address 0x400b8040 is not stack'd, malloc'd or (recently) free'd
==118477== 
==118477== 
==118477== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==118477==  Access not within mapped region at address 0x400B8040
==118477==    at 0x48925C8: __GI_memcpy (in /nix/store/jp9c2qj2dmii4c1sqrpmr2qp592nvsli-valgrind-3.24.0/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==118477==    by 0x219BEA5B: u_default_buffer_subdata (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x21750E7F: mesa_rust::pipe::context::PipeContext::buffer_subdata (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216CECFB: rusticl::core::memory::Buffer::write (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216B48F7: rusticl::api::memory::enqueue_write_buffer::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x217328CF: core::ops::function::FnOnce::call_once{{vtable.shim}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216F3DB7: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216FDBB7: rusticl::core::event::Event::call::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216ECB43: core::option::Option<T>::map_or (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216FD91F: rusticl::core::event::Event::call (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x21704B23: rusticl::core::queue::Queue::new::{{closure}} (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==    by 0x216A15AF: std::sys::backtrace::__rust_begin_short_backtrace (in /nix/store/j67nja68wk0932kpcc2gir95zgvn4gix-mesa-25.0.0-asahi-opencl/lib/libRusticlOpenCL.so.1.0.0)
==118477==  If you believe this happened as a result of a stack
==118477==  overflow in your program's main thread (unlikely but
==118477==  possible), you can try to increase the size of the
==118477==  main thread stack using the --main-stacksize= flag.
==118477==  The main thread stack size used in this run was 8388608.
==118477== 
==118477== HEAP SUMMARY:
==118477==     in use at exit: 200,911,965 bytes in 142,087 blocks
==118477==   total heap usage: 425,014 allocs, 282,927 frees, 322,863,975 bytes allocated
==118477== 
==118477== LEAK SUMMARY:
==118477==    definitely lost: 170,726 bytes in 43 blocks
==118477==    indirectly lost: 0 bytes in 0 blocks
==118477==      possibly lost: 30,644,137 bytes in 113,615 blocks
==118477==    still reachable: 170,097,102 bytes in 28,429 blocks
==118477==         suppressed: 0 bytes in 0 blocks
==118477== Rerun with --leak-check=full to see details of leaked memory
==118477== 
==118477== For lists of detected and suppressed errors, rerun with: -s
==118477== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

rowanG077 avatar Feb 07 '25 14:02 rowanG077

Pinging @athas. It's not entirely clear to me whether @pepijndevos or you are maintaining this?

rowanG077 avatar Feb 07 '25 14:02 rowanG077

I am probably maintaining it, sort of, but I don't really know well how it works, or how to debug it. Python's FFI is rather mysterious to me.

Half-hearted guess: something deallocates memory at an unexpected point. futhark bench always synchronises after an entry point, but I think futhark-pycffi expects the user to synchronise manually.

athas avatar Feb 07 '25 14:02 athas

Hmmm actually simply setting the dtype=np.uint8 on the numpy array seems to at least solve the problem here. This was a long road to minimize let's see if I ended up actually removing the real problem.

Investigating further. In my real code where the segfault occurs I can fix it by forcing the numpy array to be in C contiguous order.

I guess this should be done in this library.

Edit: With that I get no segfault, but the result does not match between opencl and multicore...

rowanG077 avatar Feb 07 '25 14:02 rowanG077

I basically wrote this but am not actively using Futhark currently. Usually by the time I open my email @athas is already on top of it. I'd happily review a pr that enforces memory order though. Iirc numpy has some function to get a C compatible array that we should use.

pepijndevos avatar Feb 07 '25 15:02 pepijndevos

I have a hunch that this may occur if the NumPy array has an offset.

athas avatar Feb 09 '25 15:02 athas