OpenCL.jl icon indicating copy to clipboard operation
OpenCL.jl copied to clipboard

create_some_context causes Segmentation fault from REPL

Open lwabeke opened this issue 6 years ago • 5 comments

Hi

When I call create_some_context from the REPL it causes a Segmentation fault which closes julia, however if I can from the bash command line call example code successfully (./run_examples.sh).

Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)

I traced (manually executed code lines) it and it atleast got to the line 140 from src/context.jl: ctx_id = api.clCreateContext(

I suspect it has something to do with environmental variables/paths and possibly to do with module initialisation code which executes differently from the REPL compared to running a julia script from the bash command line.

Pkg.test("OpenCL") mostly works, but gives the 3 errors :

OpenCL.Program                      |   63     3     66
  OpenCL.Program source constructor |    3            3
  OpenCL.Program info               |   24           24
  OpenCL.Program build              |   12           12
  OpenCL.Program source code        |    3            3
  OpenCL.Program binaries           |   21     3     24```

I'm not sure how to figure out the environmental variables/paths that gets used during the ccall and how to trace it further.

lwabeke avatar Jun 10 '19 15:06 lwabeke

Can you post the stacktrace you get from the segfault, as well as the OpenCL driver you're using (I assume it's Apple's OpenCL implementation)? It could be something to do with GC running earlier in the REPL, which is uncovering a bug.

jpsamaroo avatar Jun 26 '19 13:06 jpsamaroo

Here I can put what I get an OSX with similar specs

julia> versioninfo()
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, ivybridge)

The test:

julia> Pkg.test("OpenCL") 
   Testing OpenCL
 Resolving package versions...
    Status `/var/folders/mg/sxhvxdv96mqgprwhzxv8gwbw0000gn/T/tmpZQxe4Y/Manifest.toml`
  [08131aa3] OpenCL v0.8.0
  [2a0f44e3] Base64  [`@stdlib/Base64`]
  [8ba89e20] Distributed  [`@stdlib/Distributed`]
  [b77e0a4c] InteractiveUtils  [`@stdlib/InteractiveUtils`]
  [8f399da3] Libdl  [`@stdlib/Libdl`]
  [37e2e46d] LinearAlgebra  [`@stdlib/LinearAlgebra`]
  [56ddb016] Logging  [`@stdlib/Logging`]
  [d6f4376e] Markdown  [`@stdlib/Markdown`]
  [de0858da] Printf  [`@stdlib/Printf`]
  [9a3f8284] Random  [`@stdlib/Random`]
  [9e88b42a] Serialization  [`@stdlib/Serialization`]
  [6462fe0b] Sockets  [`@stdlib/Sockets`]
  [8dfed614] Test  [`@stdlib/Test`]
  [4ec0a83e] Unicode  [`@stdlib/Unicode`]
Test Summary: | Pass  Total
layout        |    2      2
Test Summary:   | Pass  Total
OpenCL.Platform |   13     13
Couldn't compile kernel: 
    1   : 
    2   :     __kernel void test() {
    3   :         int c = 1 + 1;
    4   :     };
With following build error:
No kernels or only kernel prototypes found when build executable.
Couldn't compile kernel: 
    1   : 
    2   :     __kernel void test() {
    3   :         int c = 1 + 1;
    4   :     };
With following build error:
<program source>:5:13: warning: unused variable 'c'
        int c = 1 + 1;
            ^
No kernels or only kernel prototypes found.

Couldn't compile kernel: 
    1   : 
    2   :     __kernel void test() {
    3   :         int c = 1 + 1;
    4   :     };
With following build error:
<program source>:5:13: warning: unused variable 'c'
        int c = 1 + 1;
            ^
No kernels or only kernel prototypes found.

Test Summary:  | Callback works
Pass  TotalCallback works

OpenCL.ContextCallback works
 |   50  Callback works
   50
Callback works
Test Summary: | Pass  Callback works
Total
Callback works
OpenCL.Device | Callback works
 122    122Callback works

┌ Warning: Platform Apple does not seem to suport out of order queues: 
│ CLError(code=-30, CL_INVALID_VALUE)
└ @ Main.TestOpenCL ~/.julia/packages/OpenCL/vsBez/test/test_cmdqueue.jl:16
OpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (cld returned: -35). |
Test Summary:   | OpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (cld returned: -35). |
Pass  TotalOpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30 |

OpenCL.CmdQueueOpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (cld returned: -35). |
 |   61  OpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30 |
   61
OpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30 |
Test Summary: | Pass  Total
OpenCL.Minver |   20     20
Test Summary: | Pass  Total
OpenCL.Event  |   48     48
OpenCL.Program binaries: Test Failed at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
  Expression: prg2[:binaries] == binaries
   Evaluated: Dict{OpenCL.cl.Device,Array{UInt8,N} where N}() == Dict{OpenCL.cl.Device,Array{UInt8,N} where N}(OpenCL.Device(Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz on Apple @0x00000000ffffffff)=>[0x62, 0x70, 0x6c, 0x69, 0x73, 0x74, 0x30, 0x30, 0xd4, 0x01  …  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0xee])
Stacktrace:
 [1] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
 [2] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [3] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:76
 [4] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [5] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:3
OpenCL.Program binaries: Test Failed at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
  Expression: prg2[:binaries] == binaries
   Evaluated: Dict{OpenCL.cl.Device,Array{UInt8,N} where N}() == Dict{OpenCL.cl.Device,Array{UInt8,N} where N}(OpenCL.Device(AMD Radeon HD - FirePro D300 Compute Engine on Apple @0x0000000001021c00)=>[0x62, 0x70, 0x6c, 0x69, 0x73, 0x74, 0x30, 0x30, 0xd4, 0x01  …  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x35, 0xd2])
Stacktrace:
 [1] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
 [2] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [3] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:76
 [4] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [5] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:3
OpenCL.Program binaries: Test Failed at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
  Expression: prg2[:binaries] == binaries
   Evaluated: Dict{OpenCL.cl.Device,Array{UInt8,N} where N}() == Dict{OpenCL.cl.Device,Array{UInt8,N} where N}(OpenCL.Device(AMD Radeon HD - FirePro D300 Compute Engine on Apple @0x0000000002021c00)=>[0x62, 0x70, 0x6c, 0x69, 0x73, 0x74, 0x30, 0x30, 0xd4, 0x01  …  0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x35, 0xd2])
Stacktrace:
 [1] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
 [2] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [3] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:76
 [4] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [5] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:3
Test Summary:                       | Pass  Fail  Total
OpenCL.Program                      |   63     3     66
  OpenCL.Program source constructor |    3            3
  OpenCL.Program info               |   24           24
  OpenCL.Program build              |   12           12
  OpenCL.Program source code        |    3            3
  OpenCL.Program binaries           |   21     3     24
ERROR: LoadError: LoadError: Some tests did not pass: 63 passed, 3 failed, 0 errored, 0 broken.
in expression starting at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:1
in expression starting at /Users/macpro/.julia/packages/OpenCL/vsBez/test/runtests.jl:30
ERROR: Package OpenCL errored during testing
Stacktrace:
 [1] pkgerror(::String, ::Vararg{String,N} where N) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Types.jl:120
 [2] #test#66(::Bool, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:1328
 [3] #test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:0 [inlined]
 [4] #test#44(::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:193
 [5] test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:178 [inlined]
 [6] #test#43 at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:175 [inlined]
 [7] test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:175 [inlined]
 [8] #test#42 at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:174 [inlined]
 [9] test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:174 [inlined]
 [10] #test#41(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::String) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:173
 [11] test(::String) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:173
 [12] top-level scope at none:0

davidbp avatar Jun 29 '19 14:06 davidbp

Hi

I just did a ]update and still getting the same

If I open Julia and call ]test OpenCL I get essentially the same output as @davidbp , where some tests run through and others fail, but those failures don't cause the Julia process to just die.

Running create_some_context as first command after the using OpenCL in new Julia session causes the crash, see below:

Leons-MacBook-Pro:~ lwabeke$ /Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _ |  |
  | | |_| | | | (_| |  |  Version 1.1.1 (2019-05-16)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using OpenCL

julia> device, ctx, queue = cl.create_compute_context()
Segmentation fault: 11
Leons-MacBook-Pro:~ lwabeke$ 


Trying to get a stack backtrace, this is the best I can do at the moment. I guess if I need more details, I would have to custom built Julia with debugging enabled?

Leons-MacBook-Pro:~ lwabeke$ lldb /Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia 
(lldb) target create "/Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia"
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 52, in <module>
    import weakref
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/weakref.py", line 14, in <module>
    from _weakref import (
ImportError: cannot import name _remove_dead_weakref
Current executable set to '/Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia' (x86_64).
(lldb) run
Process 19629 launched: '/Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia' (x86_64)
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _|  |
  | | |_| | | | (_| |  |  Version 1.1.1 (2019-05-16)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using OpenCL

julia> device, ctx, queue = cl.create_compute_context()
Process 19629 stopped
* thread #7, queue = 'com.apple.root.utility-qos', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00007fff62131232 libsystem_c.dylib`strlen + 18
libsystem_c.dylib`strlen:
->  0x7fff62131232 <+18>: pcmpeqb (%rdi), %xmm0
    0x7fff62131236 <+22>: pmovmskb %xmm0, %esi
    0x7fff6213123a <+26>: andq   $0xf, %rcx
    0x7fff6213123e <+30>: orq    $-0x1, %rax
Target 0: (julia) stopped.
(lldb) bt
* thread #7, queue = 'com.apple.root.utility-qos', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00007fff62131232 libsystem_c.dylib`strlen + 18
    frame #1: 0x00007fff4292c882 OpenCL`___lldb_unnamed_symbol2$$OpenCL + 916
    frame #2: 0x00007fff620af5fa libdispatch.dylib`_dispatch_call_block_and_release + 12
    frame #3: 0x00007fff620a7db8 libdispatch.dylib`_dispatch_client_callout + 8
    frame #4: 0x00007fff620a9b2c libdispatch.dylib`_dispatch_root_queue_drain + 902
    frame #5: 0x00007fff620a9755 libdispatch.dylib`_dispatch_worker_thread3 + 101
    frame #6: 0x00007fff623f9169 libsystem_pthread.dylib`_pthread_wqthread + 1387
    frame #7: 0x00007fff623f8be9 libsystem_pthread.dylib`start_wqthread + 13

lwabeke avatar Jul 03 '19 07:07 lwabeke

Given that we don't get a pretty backtrace after the segfault, could you put the call to cl.create_some_context or cl.create_compute_context in Debugger.jl and see which line it crashes on? It should probably be a ccall, since it seems to be crashing in C. I suspect we're passing arguments incorrectly or some such thing.

jpsamaroo avatar Jul 03 '19 10:07 jpsamaroo

I have the same issue on OSX with OpenCL.jl v0.8.0.

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

the initial call here was device = last(cl.devices(:gpu)); @enter cl.Context(device). Here is the status and backtrace right before the segmentation fault:

In clCreateContext(arg1, arg2, arg3, arg4, arg5, arg6) at /Users/kose/.julia/packages/OpenCL/vsBez/src/api.jl:18
 17          function $func($(args_in...))
>18              ccall(($(string(func)), libopencl),
 19                     $ret_type,
 20                     $arg_types,
 21                     $(args_in...))
 22          end

About to run: (<suppressed 140 bytes of output>)(Ptr{Nothing} @0x0000000000000000, 1, <suppressed 46 bytes of output>, Ptr{Nothing} @0x000000012abbef10, <suppressed 147 bytes of output>, Base.RefValue{Int32}(0))
1|debug> bt
[1] clCreateContext(arg1, arg2, arg3, arg4, arg5, arg6) at /Users/kose/.julia/packages/OpenCL/vsBez/src/api.jl:18
  | arg1::Ptr{Nothing} = Ptr{Nothing} @0x0000000000000000
  | arg2::Int64 = 1
  | arg3::Array{Ptr{Nothing},1} = Ptr{Nothing}[Ptr{Nothing} @0x0000000001021c00]
  | arg4::Ptr{Nothing} = Ptr{Nothing} @0x000000012abbef10
  | arg5::Base.CFunction = Base.CFunction(Ptr{Nothing} @0x000000012abbec40, OpenCL.cl.raise_context_error, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)
  | arg6::Base.RefValue{Int32} = Base.RefValue{Int32}(0)
[2] #Context#44(properties, callback, , devs) at /Users/kose/.julia/packages/OpenCL/vsBez/src/context.jl:140
  | properties::Nothing = nothing
  | callback::Nothing = nothing
  | ::DataType = OpenCL.cl.Context
  | devs::Array{OpenCL.cl.Device,1} = OpenCL.cl.Device[OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)]
  | ctx_properties::Ptr{Nothing} = Ptr{Nothing} @0x0000000000000000
  | n_devices::Int64 = 1
  | device_ids::Array{Ptr{Nothing},1} = Ptr{Nothing}[Ptr{Nothing} @0x0000000001021c00]
  | err_code::Base.RefValue{Int32} = Base.RefValue{Int32}(0)
  | payload::typeof(OpenCL.cl.raise_context_error) = OpenCL.cl.raise_context_error
  | f_ptr::Base.CFunction = Base.CFunction(Ptr{Nothing} @0x000000012abbec40, OpenCL.cl.raise_context_error, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)
  | i::Int64 = 1
  | d::OpenCL.cl.Device = OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)
[3] Type(#temp#, , devs) at none:0
  | ::DataType = OpenCL.cl.Context
  | devs::Array{OpenCL.cl.Device,1} = OpenCL.cl.Device[OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)]
  | properties::Nothing = nothing
  | callback::Nothing = nothing
[4] #Context#45(properties, callback, , d) at /Users/kose/.julia/packages/OpenCL/vsBez/src/context.jl:150
  | properties::Nothing = nothing
  | callback::Nothing = nothing
  | ::DataType = OpenCL.cl.Context
  | d::OpenCL.cl.Device = OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)
[5] Type(d) at /Users/kose/.julia/packages/OpenCL/vsBez/src/context.jl:150
  | d::OpenCL.cl.Device = OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)
1|debug> s
Segmentation fault: 11

kose-y avatar Jul 06 '19 11:07 kose-y

I built OpenCL_jll with BinaryBuilder.jl and plan to refactor the package here to load it instead. This will install the binary dependencies for the end user in theory. I will close the issue as it is hard to reproduce in 2022.

juliohm avatar Oct 01 '22 22:10 juliohm