CLArrays.jl
CLArrays.jl copied to clipboard
Structs with tuple fields as broadcast arguments
I've been trying to broadcast with an argument like this:
struct WithTuple
a::Int32
b::Tuple{Int32,Int32}
end
But it breaks with an error like this, somehow to do with differences in the packing of the tuple:
ERROR: Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{UInt32,2,CLArrays.HostPtr{UInt32}},Cellular.Life{Cellular.RadialNeighborhood{:test,Cellular.Skip},Int32,Tuple{Int32,Int32}}}.
Please make sure to define OpenCL structs correctly!
You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
Consider the following example:
```
//packed
// Tuple{NTuple{3, Float32}, Void, Float32}
struct __attribute__((packed)) Test{
float3 f1;
int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
// you might need to define the alignement of fields to match julia's layout
float f3; // for the types used here the alignement matches though!
};
// this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
// Tuple{Int64, Int32}
struct __attribute__((packed)) Test2{
long f1;
int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
};
```
You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
Are you using this with an OpenCL CPU driver? I have been figthing with issues related to different alignment for intel/amd cpu opengl drivers... What's the output of this:
julia> using CLArrays
julia> x = CLArray([0]) |> CLArrays.device
OpenCL.Device(Intel(R) HD Graphics 630 on Intel(R) OpenCL @0x000000000772e440)
I'm not totally sure, was hoping it was on the (somewhat meagre) GPU! Two driver versions show up on my setup:
OpenCL.Device(Intel(R) HD Graphics on Intel(R) OpenCL @0x000055ad4d6cb990)
OpenCL.Device(Intel(R) HD Graphics 5500 BroadWell U-Processor GT2 on Intel Gen OCL Driver @0x00007f573ac89600)
Using the first one just kills julia so I figure its an artefact of some kind, so I init()
the second. It runs fine for simpler problems, but doesn't handle tuples anywhere, or structs if they are broadcast over in an array - giving a similar message for both.
It should be the latest package clones and Julia, this is a fresh install yesterday. Let me know if you need any other details.
I'm mostly trying to test that OpenCL will actually run the kernels that I was running on regular CPU. The main GPU work will be on CUDA, which I'm just getting set up. I'm aiming to build a simulation framework that will run user defined inner kernels on CPU/OpenCL/CUDA interchangeably, but I'm not sure how practically possible that is yet.
You might be able to get around this problem if you can insert some padding ;) E.g. try:
struct WithTuple
a::Int32
pad::Int32 # might be even more efficient
b::Tuple{Int32,Int32}
end
Unfortunately not so easy! What would I be aiming for - padding it out to 64bit multiples?
I'm also wondering if there is a long term solution for this, as in is it be possible to repack to a correct struct automatically? I was hoping user-supplied isbits structs would eventually work without this kind of step.
I was hoping user-supplied isbits structs would eventually work without this kind of step.
I put a lot of effort into this and I thought I got it working with most gpu opencl drivers. The problem is, that the OpenCL specs don't seem to guarantee any alignment, so it can be pretty much vendor specific. As far as I know, one can't actually query the alignment of an opencl struct, so it would be a lot of work to support all different vendors - I also found some alignment bugs which they probably won't fix, so this whole thing is a mess.
I posted a question on stackoverflow a while ago:
https://stackoverflow.com/questions/47076012/opencl-only-on-amd-cl-invalid-arg-size
There was actually a suggestion in there:
As a workaround, you could copy structs into an OpenCL memory buffer and pass them by reference?
I'm not a 100% sure, if that is a valid workaround for your specific issue - but definitely worth a try.
Btw your example works on my gpus!
julia> using CLArrays
julia> struct WithTuple
a::Int32
b::Tuple{Int32,Int32}
end
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]
julia> x = CLArray(Int32[1,2,3,4])
julia> x .+ WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
4
5
6
7
julia> CLArrays.device(x)
OpenCL.Device(Intel(R) HD Graphics 630 on Intel(R) OpenCL @0x00000000076a8ed0)
julia> CLArrays.init(CLArrays.devices()[2])
OpenCL context with:
CL version: OpenCL 1.2 CUDA
Device: CL GeForce GTX 1060
threads: 1024
blocks: (1024, 1024, 64)
global_memory: 6442.450944 mb
free_global_memory: NaN mb
local_memory: 0.049152 mb
julia> CLArray(Int32[1,2,3,4]) .+ WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
4
5
6
7
I'm kind of sick of dealing with buggy / inconsistent OpenCL drivers :P Last time I complained to intel about driver bugs, they were telling me that they fix "obscure" bugs like this only for the newest generation.
Oh god I didn't realise it was like that... these are both Intel HD cards...
julia> using CLArrays
julia> CLArrays.init(CLArrays.devices()[2])
OpenCL context with:
CL version: OpenCL 1.2 beignet 1.3
Device: CL Intel(R) HD Graphics 5500 BroadWell U-Processor GT2
threads: 512
blocks: (512, 512, 512)
global_memory: 4119.855104 mb
free_global_memory: NaN mb
local_memory: 0.065536 mb
julia> struct WithTuple
a::Int32
b::Tuple{Int32,Int32}
end
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]
julia> x = CLArray(Int32[1,2,3,4])
GPU: 4-element Array{Int32,1}:
1
2
3
4
julia> x .+ WithTuple(1, (2,3))
ERROR: Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}.
Please make sure to define OpenCL structs correctly!
You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
Consider the following example:
```
//packed
// Tuple{NTuple{3, Float32}, Void, Float32}
struct __attribute__((packed)) Test{
float3 f1;
int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
// you might need to define the alignement of fields to match julia's layout
float f3; // for the types used here the alignement matches though!
};
// this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
// Tuple{Int64, Int32}
struct __attribute__((packed)) Test2{
long f1;
int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
};
```
You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
Stacktrace:
[1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/raf/.julia/v0.6/OpenCL/src/kernel.jl:186
[2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1
},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:279
[3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1
},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:272
[4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.
CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any
,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:18
[5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,
0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/raf/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
[6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/raf/.julia/v0.6/GPUArrays/src/broadcast.jl:89
[7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/raf/.julia/v0.6/GPUArrays/src/broadcast.jl:49
[8] broadcast_c at ./broadcast.jl:316 [inlined]
[9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455
[10] macro expansion at ./REPL.jl:97 [inlined]
[11] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
Could it be beignet? devices [1] actually just segfaults on that last line.
Or some other compiler version issue? I'm on arch linux so occasionally get bitten by bleeding edge releases of compilers breaking things.
Yeah, definitely... the last time I tried beignet on linux, it failed with a self test saying:
test failed: (3 + 1) != 4
:D So I lost a bit of trust in beignet, although it seems like they improved a lot recently!
Do you have a snippet of code that fails using the OpenCL provided by beignet? I would like to try it and provide feedback. In my computer all tests passed.
For my Intel HD 5500 the simple demo above fails with the error shown, or a segfault, depending on the driver I select, as for some reason there is two.
So far intels compute-runtime and the older intel-opencl drivers also just segfault when I run that code.
Anyway, thanks @SimonDanisch for all your work on these things, especially now I know what a mess you have to deal with behind the scenes!!!
I actually tested the code and it failed just like yours
x .+ WithTuple(1, (2,3))
ERROR: Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}.
Please make sure to define OpenCL structs correctly!
You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
Consider the following example:
```
//packed
// Tuple{NTuple{3, Float32}, Void, Float32}
struct __attribute__((packed)) Test{
float3 f1;
int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
// you might need to define the alignement of fields to match julia's layout
float f3; // for the types used here the alignement matches though!
};
// this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
// Tuple{Int64, Int32}
struct __attribute__((packed)) Test2{
long f1;
int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
};
```
You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
Stacktrace:
[1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/david/.julia/v0.6/OpenCL/src/kernel.jl:186
[2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:279
[3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:272
[4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:18
[5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/david/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
[6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:89
[7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:49
[8] broadcast_c at ./broadcast.jl:316 [inlined]
[9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455
Yet CLArrays seem to work
x .+ x
GPU: 4-element Array{Int32,1}:
2
4
6
8
I actually tested the code and it failed just like yours
On my GPU it didn't fail?
Yet CLArrays seem to work
Yeah this should work, since there isn't any struct involved?
Interesting would be CLArray([WithTuple(1, (2,3))])
!
Did you try with device equal to the Iris pro? The whole test: It works with CLArray on "CPU_OpenCL" but not for "GPU_Opencl".
julia> using CLArrays
julia> struct WithTuple
a::Int32
b::Tuple{Int32,Int32}
end
julia> CLArrays.init(CLArrays.devices()[2])
OpenCL context with:
CL version: OpenCL 1.2 (Build 43)
Device: CL Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
threads: 8192
blocks: (8192, 8192, 8192)
global_memory: 16728.113152 mb
free_global_memory: NaN mb
local_memory: 0.032768 mb
julia> CLArray([WithTuple(1, (2,3))])
GPU: 1-element Array{WithTuple,1}:
WithTuple(1, (2, 3))
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]
julia> x = CLArray(Int32[1,2,3,4])
GPU: 4-element Array{Int32,1}:
1
2
3
4
julia> x .+ WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
4
5
6
7
julia> CLArrays.init(CLArrays.devices()[1])
OpenCL context with:
CL version: OpenCL 1.2 beignet 1.4 (git-591d387)
Device: CL Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
threads: 512
blocks: (512, 512, 512)
global_memory: 2147.483648 mb
free_global_memory: NaN mb
local_memory: 0.065536 mb
julia> x = CLArray(Int32[1,2,3,4])
GPU: 4-element Array{Int32,1}:
1
2
3
4
julia> x .+ WithTuple(1, (2,3))
ERROR: Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}.
Please make sure to define OpenCL structs correctly!
You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
Consider the following example:
```
//packed
// Tuple{NTuple{3, Float32}, Void, Float32}
struct __attribute__((packed)) Test{
float3 f1;
int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
// you might need to define the alignement of fields to match julia's layout
float f3; // for the types used here the alignement matches though!
};
// this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
// Tuple{Int64, Int32}
struct __attribute__((packed)) Test2{
long f1;
int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
};
```
You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
Stacktrace:
[1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/david/.julia/v0.6/OpenCL/src/kernel.jl:186
[2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:279
[3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:272
[4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:18
[5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/david/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
[6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:89
[7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:49
[8] broadcast_c at ./broadcast.jl:316 [inlined]
[9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455
Odly enough if I broadcast x .+3
it also works on the iris graphics. And it's doing the exact same operation. Is it something related to transpiler?
Yeah that's the same behaviour I'm seeing, the simple case works fine but a struct or tuple gives that error. I haven't tried it on an opencl CPU, but it's interesting that one works and the other doesn't.
Building the CLArray is totally fine:
CLArray([WithTuple(1, (2,3))])
GPU: 1-element Array{WithTuple,1}:
WithTuple(1, (2, 3))
Is julia getting the refinition of "+" correctly?
@which x .+ WithTuple(1, (2,3))
(::Base.##715#716)(a, b) in Base at deprecated.jl:354
Odly enough if I broadcast x .+3 it also works on the iris graphics
The problem is WithTuple
, so if you don't use it, there is no problem, right?!
All you examples just work on all my GPUs. It's your beigenet driver, that seems to choose a different alignment of WithTuple - possibly as part of a bug!
Building the CLArray is totally fine: CLArray([WithTuple(1, (2,3))])
What I meant you to try is something like:
x .+ CLArray(fill(WithTuple(1, (2,3)) , length(x)))
This actually works:
julia> a = CLArray([WithTuple(1, (2,3))])
GPU: 1-element Array{WithTuple,1}:
WithTuple(1, (2, 3))
julia> x .+ a
GPU: 4-element Array{Int32,1}:
4
5
6
7
Or using your example:
julia> x .+ CLArray(fill(WithTuple(1, (2,3)) , length(x)))
GPU: 4-element Array{Int32,1}:
4
3
5
4
Cool! :) So the tip from stackoverflow is actually working :-O So this is a bug in uploading structs per value in gpu kernels - which seems to be ill defined in OpenCL.
Great to get that narrowed down!! Do you have the SO link? would be good to understand what is happening.
https://stackoverflow.com/questions/15639197/passing-struct-to-gpu-with-opencl-that-contains-an-array-of-floats
Thanks
@davidbp sorry, I thought you were actually talking to me - I see now, you just reproduced the failure :)
No problem. I hope OpenCL gets better future support from vendors. I feel it's not going to happen. Maybe with Vulkhan...