CLArrays.jl Structs with tuple fields as broadcast arguments

I've been trying to broadcast with an argument like this:

struct WithTuple
    a::Int32
    b::Tuple{Int32,Int32}
end

But it breaks with an error like this, somehow to do with differences in the packing of the tuple:

ERROR:     Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{UInt32,2,CLArrays.HostPtr{UInt32}},Cellular.Life{Cellular.RadialNeighborhood{:test,Cellular.Skip},Int32,Tuple{Int32,Int32}}}. 
    Please make sure to define OpenCL structs correctly!
    You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
    Consider the following example:
        ```
        //packed
        // Tuple{NTuple{3, Float32}, Void, Float32}
        struct __attribute__((packed)) Test{
            float3 f1;
            int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
            // you might need to define the alignement of fields to match julia's layout
            float f3; // for the types used here the alignement matches though!
        };
        // this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
        // Tuple{Int64, Int32}
        struct __attribute__((packed)) Test2{
            long f1;
            int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
        };
        ```
    You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!

Jul 22 '18 01:07 rafaqz

Are you using this with an OpenCL CPU driver? I have been figthing with issues related to different alignment for intel/amd cpu opengl drivers... What's the output of this:

julia> using CLArrays
julia> x = CLArray([0]) |> CLArrays.device
OpenCL.Device(Intel(R) HD Graphics 630 on Intel(R) OpenCL @0x000000000772e440)

Jul 22 '18 09:07 SimonDanisch

I'm not totally sure, was hoping it was on the (somewhat meagre) GPU! Two driver versions show up on my setup:

OpenCL.Device(Intel(R) HD Graphics on Intel(R) OpenCL @0x000055ad4d6cb990)                                    
OpenCL.Device(Intel(R) HD Graphics 5500 BroadWell U-Processor GT2 on Intel Gen OCL Driver @0x00007f573ac89600)

Using the first one just kills julia so I figure its an artefact of some kind, so I init() the second. It runs fine for simpler problems, but doesn't handle tuples anywhere, or structs if they are broadcast over in an array - giving a similar message for both.

It should be the latest package clones and Julia, this is a fresh install yesterday. Let me know if you need any other details.

Jul 22 '18 12:07 rafaqz

I'm mostly trying to test that OpenCL will actually run the kernels that I was running on regular CPU. The main GPU work will be on CUDA, which I'm just getting set up. I'm aiming to build a simulation framework that will run user defined inner kernels on CPU/OpenCL/CUDA interchangeably, but I'm not sure how practically possible that is yet.

Jul 22 '18 12:07 rafaqz

You might be able to get around this problem if you can insert some padding ;) E.g. try:

struct WithTuple
    a::Int32
    pad::Int32 # might be even more efficient
    b::Tuple{Int32,Int32}
end

Jul 22 '18 15:07 SimonDanisch

Unfortunately not so easy! What would I be aiming for - padding it out to 64bit multiples?

I'm also wondering if there is a long term solution for this, as in is it be possible to repack to a correct struct automatically? I was hoping user-supplied isbits structs would eventually work without this kind of step.

Jul 23 '18 09:07 rafaqz

I was hoping user-supplied isbits structs would eventually work without this kind of step.

I put a lot of effort into this and I thought I got it working with most gpu opencl drivers. The problem is, that the OpenCL specs don't seem to guarantee any alignment, so it can be pretty much vendor specific. As far as I know, one can't actually query the alignment of an opencl struct, so it would be a lot of work to support all different vendors - I also found some alignment bugs which they probably won't fix, so this whole thing is a mess.

I posted a question on stackoverflow a while ago:

https://stackoverflow.com/questions/47076012/opencl-only-on-amd-cl-invalid-arg-size

There was actually a suggestion in there:

As a workaround, you could copy structs into an OpenCL memory buffer and pass them by reference?

I'm not a 100% sure, if that is a valid workaround for your specific issue - but definitely worth a try.

Jul 23 '18 09:07 SimonDanisch

Btw your example works on my gpus!

julia> using CLArrays
julia> struct WithTuple
           a::Int32
           b::Tuple{Int32,Int32}
       end
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]
julia> x = CLArray(Int32[1,2,3,4])
julia> x .+  WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
 4
 5
 6
 7
julia> CLArrays.device(x)
OpenCL.Device(Intel(R) HD Graphics 630 on Intel(R) OpenCL @0x00000000076a8ed0)
julia> CLArrays.init(CLArrays.devices()[2])
OpenCL context with:
CL version: OpenCL 1.2 CUDA
Device: CL GeForce GTX 1060
            threads: 1024
             blocks: (1024, 1024, 64)
      global_memory: 6442.450944 mb
 free_global_memory: NaN mb
       local_memory: 0.049152 mb
julia> CLArray(Int32[1,2,3,4]) .+ WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
 4
 5
 6
 7

I'm kind of sick of dealing with buggy / inconsistent OpenCL drivers :P Last time I complained to intel about driver bugs, they were telling me that they fix "obscure" bugs like this only for the newest generation.

Jul 23 '18 09:07 SimonDanisch

Oh god I didn't realise it was like that... these are both Intel HD cards...

julia> using CLArrays                                                           
                                                                                
julia> CLArrays.init(CLArrays.devices()[2])                                     
OpenCL context with:                                                            
CL version: OpenCL 1.2 beignet 1.3                                              
Device: CL Intel(R) HD Graphics 5500 BroadWell U-Processor GT2                  
            threads: 512                                                        
             blocks: (512, 512, 512)                                            
      global_memory: 4119.855104 mb                                             
 free_global_memory: NaN mb                                                     
       local_memory: 0.065536 mb                                                


julia> struct WithTuple
           a::Int32
           b::Tuple{Int32,Int32}
       end                                                                      
                                                                                
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]                         
                                                                                
julia> x = CLArray(Int32[1,2,3,4])                                              
GPU: 4-element Array{Int32,1}:                                                  
 1                                                                              
 2                                                                              
 3                                                                              
 4                                                                              
                                                                                
julia> x .+  WithTuple(1, (2,3))                                                
ERROR:     Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}. 
    Please make sure to define OpenCL structs correctly!
    You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
    Consider the following example:
        ```
        //packed
        // Tuple{NTuple{3, Float32}, Void, Float32}
        struct __attribute__((packed)) Test{
            float3 f1;
            int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
            // you might need to define the alignement of fields to match julia's layout
            float f3; // for the types used here the alignement matches though!
        };
        // this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
        // Tuple{Int64, Int32}
        struct __attribute__((packed)) Test2{
            long f1;
            int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
        };
        ```
    You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
                                                                                
Stacktrace:                                                                     
 [1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/raf/.julia/v0.6/OpenCL/src/kernel.jl:186                                                                            
 [2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1
},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:279                
 [3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1
},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:272                                      
 [4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.
CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any
,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:18
 [5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,
0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/raf/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151                                   
 [6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/raf/.julia/v0.6/GPUArrays/src/broadcast.jl:89                                                
 [7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/raf/.julia/v0.6/GPUArrays/src/broadcast.jl:49                                     
 [8] broadcast_c at ./broadcast.jl:316 [inlined]                                
 [9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455                                                                       
 [10] macro expansion at ./REPL.jl:97 [inlined]                                 
 [11] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73

Jul 23 '18 13:07 rafaqz

Could it be beignet? devices [1] actually just segfaults on that last line.

Or some other compiler version issue? I'm on arch linux so occasionally get bitten by bleeding edge releases of compilers breaking things.

Jul 23 '18 13:07 rafaqz

Yeah, definitely... the last time I tried beignet on linux, it failed with a self test saying:

test failed: (3 + 1) != 4

:D So I lost a bit of trust in beignet, although it seems like they improved a lot recently!

Jul 23 '18 15:07 SimonDanisch

Do you have a snippet of code that fails using the OpenCL provided by beignet? I would like to try it and provide feedback. In my computer all tests passed.

Jul 24 '18 12:07 davidbp

For my Intel HD 5500 the simple demo above fails with the error shown, or a segfault, depending on the driver I select, as for some reason there is two.

So far intels compute-runtime and the older intel-opencl drivers also just segfault when I run that code.

Jul 25 '18 01:07 rafaqz

Anyway, thanks @SimonDanisch for all your work on these things, especially now I know what a mess you have to deal with behind the scenes!!!

Jul 25 '18 06:07 rafaqz

I actually tested the code and it failed just like yours

x .+  WithTuple(1, (2,3))

ERROR:     Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}. 
    Please make sure to define OpenCL structs correctly!
    You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
    Consider the following example:
        ```
        //packed
        // Tuple{NTuple{3, Float32}, Void, Float32}
        struct __attribute__((packed)) Test{
            float3 f1;
            int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
            // you might need to define the alignement of fields to match julia's layout
            float f3; // for the types used here the alignement matches though!
        };
        // this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
        // Tuple{Int64, Int32}
        struct __attribute__((packed)) Test2{
            long f1;
            int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
        };
        ```
    You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!

Stacktrace:
 [1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/david/.julia/v0.6/OpenCL/src/kernel.jl:186
 [2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:279
 [3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:272
 [4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:18
 [5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/david/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
 [6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:89
 [7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:49
 [8] broadcast_c at ./broadcast.jl:316 [inlined]
 [9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455

Yet CLArrays seem to work

 x .+  x
GPU: 4-element Array{Int32,1}:
 2
 4
 6
 8

Jul 26 '18 09:07 davidbp

I actually tested the code and it failed just like yours

On my GPU it didn't fail?

Yet CLArrays seem to work

Yeah this should work, since there isn't any struct involved? Interesting would be CLArray([WithTuple(1, (2,3))])!

Jul 26 '18 09:07 SimonDanisch

Did you try with device equal to the Iris pro? The whole test: It works with CLArray on "CPU_OpenCL" but not for "GPU_Opencl".

julia> using CLArrays

julia> struct WithTuple
                 a::Int32
                 b::Tuple{Int32,Int32}
             end 

julia> CLArrays.init(CLArrays.devices()[2]) 
OpenCL context with:
CL version: OpenCL 1.2 (Build 43)
Device: CL Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
            threads: 8192
             blocks: (8192, 8192, 8192)
      global_memory: 16728.113152 mb
 free_global_memory: NaN mb
       local_memory: 0.032768 mb

julia> CLArray([WithTuple(1, (2,3))])
GPU: 1-element Array{WithTuple,1}:
 WithTuple(1, (2, 3))

julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]   

julia> x = CLArray(Int32[1,2,3,4])                  
GPU: 4-element Array{Int32,1}:
 1
 2
 3
 4

julia> x .+  WithTuple(1, (2,3))      
GPU: 4-element Array{Int32,1}:
 4
 5
 6
 7

julia> CLArrays.init(CLArrays.devices()[1]) 
OpenCL context with:
CL version: OpenCL 1.2 beignet 1.4 (git-591d387)
Device: CL Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
            threads: 512
             blocks: (512, 512, 512)
      global_memory: 2147.483648 mb
 free_global_memory: NaN mb
       local_memory: 0.065536 mb

julia> x = CLArray(Int32[1,2,3,4])
GPU: 4-element Array{Int32,1}:
 1
 2
 3
 4

julia> x .+  WithTuple(1, (2,3)) 
ERROR:     Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}. 
    Please make sure to define OpenCL structs correctly!
    You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
    Consider the following example:
        ```
        //packed
        // Tuple{NTuple{3, Float32}, Void, Float32}
        struct __attribute__((packed)) Test{
            float3 f1;
            int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
            // you might need to define the alignement of fields to match julia's layout
            float f3; // for the types used here the alignement matches though!
        };
        // this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
        // Tuple{Int64, Int32}
        struct __attribute__((packed)) Test2{
            long f1;
            int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
        };
        ```
    You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!

Stacktrace:
 [1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/david/.julia/v0.6/OpenCL/src/kernel.jl:186
 [2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:279
 [3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:272
 [4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:18
 [5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/david/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
 [6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:89
 [7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:49
 [8] broadcast_c at ./broadcast.jl:316 [inlined]
 [9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455

Odly enough if I broadcast x .+3 it also works on the iris graphics. And it's doing the exact same operation. Is it something related to transpiler?

Jul 26 '18 09:07 davidbp

Yeah that's the same behaviour I'm seeing, the simple case works fine but a struct or tuple gives that error. I haven't tried it on an opencl CPU, but it's interesting that one works and the other doesn't.

Jul 26 '18 09:07 rafaqz

Building the CLArray is totally fine:

CLArray([WithTuple(1, (2,3))])                                                      
GPU: 1-element Array{WithTuple,1}:                                                         
 WithTuple(1, (2, 3))

Jul 26 '18 09:07 rafaqz

Is julia getting the refinition of "+" correctly?

@which  x .+  WithTuple(1, (2,3)) 
(::Base.##715#716)(a, b) in Base at deprecated.jl:354

Jul 26 '18 09:07 davidbp

Odly enough if I broadcast x .+3 it also works on the iris graphics

The problem is WithTuple, so if you don't use it, there is no problem, right?!

All you examples just work on all my GPUs. It's your beigenet driver, that seems to choose a different alignment of WithTuple - possibly as part of a bug!

Building the CLArray is totally fine: CLArray([WithTuple(1, (2,3))])

What I meant you to try is something like:

x .+ CLArray(fill(WithTuple(1, (2,3)) , length(x)))

Jul 26 '18 09:07 SimonDanisch

This actually works:

julia> a = CLArray([WithTuple(1, (2,3))])                                                                                                                                              
GPU: 1-element Array{WithTuple,1}:                                                         
 WithTuple(1, (2, 3))                                                                      

julia> x .+ a                                                                              
GPU: 4-element Array{Int32,1}:                                                             
 4                                                                                         
 5                                                                                         
 6                                                                                         
 7

Jul 26 '18 09:07 rafaqz

Or using your example:

julia> x .+ CLArray(fill(WithTuple(1, (2,3)) , length(x)))                                 
GPU: 4-element Array{Int32,1}:                                                             
 4                                                                                         
 3                                                                                         
 5                                                                                         
 4

Jul 26 '18 09:07 rafaqz

Cool! :) So the tip from stackoverflow is actually working :-O So this is a bug in uploading structs per value in gpu kernels - which seems to be ill defined in OpenCL.

Jul 26 '18 09:07 SimonDanisch

Great to get that narrowed down!! Do you have the SO link? would be good to understand what is happening.

Jul 26 '18 09:07 rafaqz

https://stackoverflow.com/questions/15639197/passing-struct-to-gpu-with-opencl-that-contains-an-array-of-floats

Jul 26 '18 09:07 davidbp

Thanks

Jul 26 '18 09:07 rafaqz

@davidbp sorry, I thought you were actually talking to me - I see now, you just reproduced the failure :)

Jul 26 '18 09:07 SimonDanisch

No problem. I hope OpenCL gets better future support from vendors. I feel it's not going to happen. Maybe with Vulkhan...

Jul 28 '18 15:07 davidbp

CLArrays.jl CLArrays.jl copied to clipboard

Structs with tuple fields as broadcast arguments

CLArrays.jl
CLArrays.jl copied to clipboard