CLArrays.jl
CLArrays.jl copied to clipboard
gpu_call does not seem to work on CLArrays with dimension > 1
Okay, I think this (unlike my comments on Makie) is caused by something other than driver issues. Please correct me if I am wrong.
julia> using CLArrays, GPUArrays
julia> function fill_two!(state, C)
i = GPUArrays.@linearidx C state
C[i] = 2f0
return
end
fill_two! (generic function with 1 method)
julia> c1 = CLArray(zeros(Float32,4096));
julia> j1 = GPUArrays.JLArray(zeros(Float32,4096));
julia> GPUArrays.gpu_call(fill_two!, c1, (c1,))
OpenCL.Event(@0x00007f4122be9850)
julia> GPUArrays.gpu_call(fill_two!, j1, (j1,))
julia> Array(c1)'
1×4096 RowVector{Float32,Array{Float32,1}}:
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 … 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
julia> j1'
1×4096 RowVector{Float32,GPUArrays.JLArray{Float32,1}}:
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 … 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
julia> c2 = CLArray(zeros(Float32,1,4096));
julia> j2 = GPUArrays.JLArray(zeros(Float32,1,4096));
julia> GPUArrays.gpu_call(fill_two!, c2, (c2,))
OpenCL.Event(@0x00007f412266d2f0)
julia> GPUArrays.gpu_call(fill_two!, j2, (j2,))
julia> Array(c2)
1×4096 Array{Float32,2}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
julia> j2
GPU: 1×4096 Array{Float32,2}:
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 … 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
julia> c3 = CLArray(zeros(Float32,1,4096,1));
julia> j3 = GPUArrays.JLArray(zeros(Float32,1,4096,1));
julia> GPUArrays.gpu_call(fill_two!, c3, (c3,))
OpenCL.Event(@0x00007f4122561e00)
julia> GPUArrays.gpu_call(fill_two!, j3, (j3,))
julia> Array(c3)
1×4096×1 Array{Float32,3}:
[:, :, 1] =
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
julia> j3
GPU: 1×4096×1 Array{Float32,3}:
[:, :, 1] =
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 … 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
Additionally,
julia> c2_2 = CLArray(randn(Float32,1,4096));
julia> c2 .= exp.(c2_2) .- 1.4f0;
signal (11): Segmentation fault
while loading no file, in expression starting on line 0
unknown function (ip: 0x7f11eadd1116)
_broadcast! at /home/chris/.julia/v0.6/GPUArrays/src/broadcast.jl:86
unknown function (ip: 0x7f10eadd0f56)
jl_call_fptr_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/chris/Documents/prog/julia-dev/julia/src/gf.c:1926
broadcast_c! at ./broadcast.jl:213 [inlined]
broadcast! at ./broadcast.jl:206
jl_call_fptr_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/chris/Documents/prog/julia-dev/julia/src/gf.c:1926
do_call at /home/chris/Documents/prog/julia-dev/julia/src/interpreter.c:75
eval at /home/chris/Documents/prog/julia-dev/julia/src/interpreter.c:242
eval_body at /home/chris/Documents/prog/julia-dev/julia/src/interpreter.c:539
jl_interpret_toplevel_thunk at /home/chris/Documents/prog/julia-dev/julia/src/interpreter.c:692
jl_toplevel_eval_flex at /home/chris/Documents/prog/julia-dev/julia/src/toplevel.c:592
jl_toplevel_eval_in at /home/chris/Documents/prog/julia-dev/julia/src/builtins.c:496
eval at ./boot.jl:235
unknown function (ip: 0x7f11c09837bf)
jl_call_fptr_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/chris/Documents/prog/julia-dev/julia/src/gf.c:1926
eval_user_input at ./REPL.jl:66
unknown function (ip: 0x7f11c0a04a5f)
jl_call_fptr_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/chris/Documents/prog/julia-dev/julia/src/gf.c:1926
macro expansion at ./REPL.jl:97 [inlined]
#1 at ./event.jl:73
unknown function (ip: 0x7f11914a52cf)
jl_call_fptr_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /home/chris/Documents/prog/julia-dev/julia/src/julia_internal.h:358 [inlined]
jl_apply_generic at /home/chris/Documents/prog/julia-dev/julia/src/gf.c:1926
jl_apply at /home/chris/Documents/prog/julia-dev/julia/src/julia.h:1424 [inlined]
start_task at /home/chris/Documents/prog/julia-dev/julia/src/task.c:267
unknown function (ip: 0xffffffffffffffff)
Allocations: 12699785 (Pool: 12697606; Big: 2179); GC: 28
Segmentation fault (core dumped)
But, if I stick to basic vectors
julia> using CLArrays, GPUArrays
julia> x = CLArray(randn(Float32, 4096));
julia> y = similar(x);
julia> y .= exp.(x) .- 1.4f0
GPU: 4096-element Array{Float32,1}:
3.71164
-0.586773
-0.434828
-0.88442
-1.18079
1.38948
-0.725807
-0.900368
-1.18742
1.14073
-1.22522
-0.282193
-0.662497
-1.31465
-0.492887
1.46524
1.63044
-0.780128
-1.17991
-1.12488
3.112
-0.685927
1.48443
-0.787306
0.223243
-0.918023
3.64242
-1.00337
⋮
-0.265216
-1.21193
3.06621
0.595765
-0.629847
-0.521045
0.333602
-0.578887
-1.1026
9.36444
-1.05246
-0.554016
0.0966291
0.917615
-0.831065
-0.890007
0.656895
0.0984
-0.945677
-0.822427
-0.562743
0.105189
-0.559359
0.345985
-0.523108
-0.773428
-0.901813
-0.945179
Vectors of tuples work. So far it seems that the CLArray itself has to be one dimensional. I can experiment more, using either the tuple work around or pretending that vectors are matrices and just calling (GPUArrays.gpu_)ind2sub & co manually.
This is on:
julia> versioninfo()
Julia Version 0.6.2
Commit d386e40 (2017-12-13 18:08 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
WORD_SIZE: 64
BLAS: libopenblas (ZEN)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, generic)
with a Vega card and ROCm 1.17 drivers.
this is definetely a bug! I'll look into it on monday when i'm back at a pc with opencl!
Can you try https://github.com/JuliaGPU/CLArrays.jl/pull/18 (which just got merged and I will tag it now! Deps should be tagged but will need a Pkg.update()).
All your tests pass on those tags for me - so either it is yet again a driver issue ( :'( ), or it's fixed :)
julia> Pkg.checkout("CLArrays")
INFO: Checking out CLArrays master...
INFO: Pulling CLArrays latest master...
INFO: Cloning cache of Adapt from https://github.com/MikeInnes/Adapt.jl.git
INFO: Installing Adapt v0.2.0
julia> using CLArrays, GPUArrays
INFO: Recompiling stale cache file /home/chris/.julia/lib/v0.6/CLArrays.ji for module CLArrays.
julia> function fill_two!(state, C)
i = GPUArrays.@linearidx C state
C[i] = 2f0
return
end
fill_two! (generic function with 1 method)
julia> c1 = CLArray(zeros(Float32,4096));
julia> j1 = GPUArrays.JLArray(zeros(Float32,4096));
julia> GPUArrays.gpu_call(fill_two!, c1, (c1,))
OpenCL.Event(@0x00000000045ab780)
julia> GPUArrays.gpu_call(fill_two!, j1, (j1,))
julia> Array(c1)'
1×4096 RowVector{Float32,Array{Float32,1}}:
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 … 2.0 2.0 2.0 2.0 2.0 2.0 2.0
julia> j1'
1×4096 RowVector{Float32,GPUArrays.JLArray{Float32,1}}:
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 … 2.0 2.0 2.0 2.0 2.0 2.0 2.0
julia> c2 = CLArray(zeros(Float32,1,4096));
julia> j2 = GPUArrays.JLArray(zeros(Float32,1,4096));
julia> GPUArrays.gpu_call(fill_two!, c2, (c2,))
OpenCL.Event(@0x0000000003a21270)
julia> GPUArrays.gpu_call(fill_two!, j2, (j2,))
julia> Array(c2)
1×4096 Array{Float32,2}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
julia> j2
GPU: 1×4096 Array{Float32,2}:
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 … 2.0 2.0 2.0 2.0 2.0 2.0 2.0
julia> c3 = CLArray(zeros(Float32,1,4096,1));
julia> j3 = GPUArrays.JLArray(zeros(Float32,1,4096,1));
julia> GPUArrays.gpu_call(fill_two!, c3, (c3,))
OpenCL.Event(@0x0000000004018bb0)
julia> GPUArrays.gpu_call(fill_two!, j3, (j3,))
julia> Array(c3)
1×4096×1 Array{Float32,3}:
[:, :, 1] =
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
julia> j3
GPU: 1×4096×1 Array{Float32,3}:
[:, :, 1] =
2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 … 2.0 2.0 2.0 2.0 2.0 2.0 2.0
julia> c3
GPU: 1×4096×1 Array{Float32,3}:
[:, :, 1] =
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0
julia> c2_2 = CLArray(randn(Float32,1,4096));
julia> c2 .= exp.(c2_2) .- 1.4f0;
julia> Array(c2)
1×4096 Array{Float32,2}:
-0.336505 -0.960849 0.48456 -1.13832 … 1.04627 0.798006 0.131109
julia> exp.(Array(c2_2)) .- 1.4f0;
julia> exp.(Array(c2_2)) .- 1.4f0
1×4096 Array{Float32,2}:
-0.336505 -0.960849 0.48456 -1.13832 … 1.04627 0.798006 0.131109
Okay. so broadcasting works, which creates a GPU kernel, but the kernel in my example does not (on my computer)?
I have to leave now, but this evening at the latest I'll look into this:
julia> c2 = CLArray(zeros(Float32,1,4096));
julia> j2 = GPUArrays.JLArray(zeros(Float32,1,4096));
julia> c2 .= 2f0;
julia> Array(c2)
Memory access fault by GPU node-1 on address 0x1000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
I'll also reboot my computer...which means there's maybe a 30% chance I'll have to wipe the OS and reinstall everything, and maybe a 5% chance it magically fixes everything. That it hung on the shut down and I'm writing this from another computer doesn't bode well. lol
Hm this works for me! Did you do Pkg.update() ? I don't see it in your log ;)
I'm on master of OpenCL, Transpiler, and CLArrays.
The problem seems serious, and is persistent.
julia> c1 .= 4f0;#works
julia> c2 .= 2f0;
julia> Array(c2)
Memory access fault by GPU node-1 on address 0x1000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
The computer becomes unresponsive after this memory access fault. The light on the GPU remains on after Julia exits, simple operations like trying to close browser tabs peg out a CPU core at 100% without completing.
Standard kill from top on these processes does not work. (I didn't think to try kill -KILL)
I have to manually reboot the computer to return things to normal.
I have no idea what the problem is. I did not notice the extant of the problem before I tried c2 .= 2f0. Curiously, this works fine:
julia> c2 .= exp.(c2_2) .- 1.4f0;
julia> Array(c2)
1×4096 Array{Float32,2}:
-0.336505 -0.960849 0.48456 -1.13832 … 1.04627 0.798006 0.131109
Eh, I doubt the problem is totally with CLArrays / the Julia side of things now. At least, the problem seems to extend a lot further. I'll stick with vectors, and compute indices manually.
You can close this if you like. Otherwise I'll leave it open, and retest with each driver update / new ROCm release / updated kernel. Or the next time I wipe the computer and start over.
EDIT: I'll follow the transpiler example later to look at the OpenCL it generates, and test it in isolation.