ThreadsX.jl icon indicating copy to clipboard operation
ThreadsX.jl copied to clipboard

Can't use `ThreadsX.map` as a direct drop-in due to lack of GPU support.

Open vchuravy opened this issue 3 years ago • 1 comments

I find myself writing:

function experiment(ArrayT, N, M)
    if ArrayT <: Array
        map! = ThreadsX.map!
    else
        map! = Base.map! 
    end
    ...

a lot these days :)

vchuravy avatar Jul 16 '22 19:07 vchuravy

I was going to suggest using Folds.map together with FoldsCUDA.jl, but it seems there's currently some problems with CUDAEx :cry:

julia> using Folds, CUDA, FoldsCUDA

julia> Folds.map(x -> x + 1, cu([1,2,3]), CUDAEx())
ERROR: FoldsCUDA.FailedInference: Kernel is inferred to return invalid type: BangBang.SafeCollector{Vector{Int64}}
HINT: if this exception is caught as `err``, use `CUDA.code_typed(err)` to introspect the erronous code.
Stacktrace:
  [1] _infer_acctype(rf::Function, init::BangBang.SafeCollector{BangBang.NoBang.Empty{Vector{Union{}}}}, arrays::Tuple{CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}}, include_init::Bool)
    @ FoldsCUDA ~/.julia/packages/FoldsCUDA/Mo35m/src/kernels.jl:112
  [2] _infer_acctype
    @ ~/.julia/packages/FoldsCUDA/Mo35m/src/kernels.jl:97 [inlined]
  [3] _transduce!(buf::Nothing, rf::Transducers.Reduction{Transducers.Map{typeof(first)}, Transducers.Reduction{Transducers.Map{var"#21#22"}, Transducers.Reduction{Transducers.Map{Type{BangBang.NoBang.SingletonVector}}, Transducers.BottomRF{Transducers.AdHocRF{typeof(BangBang.collector), typeof(identity), typeof(BangBang.append!!), typeof(identity), typeof(identity), Nothing}}}}}, init::BangBang.SafeCollector{BangBang.NoBang.Empty{Vector{Union{}}}}, arrays::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer})
    @ FoldsCUDA ~/.julia/packages/FoldsCUDA/Mo35m/src/kernels.jl:128
  [4] transduce_impl(rf::Transducers.Reduction{Transducers.Map{typeof(first)}, Transducers.Reduction{Transducers.Map{var"#21#22"}, Transducers.Reduction{Transducers.Map{Type{BangBang.NoBang.SingletonVector}}, Transducers.BottomRF{Transducers.AdHocRF{typeof(BangBang.collector), typeof(identity), typeof(BangBang.append!!), typeof(identity), typeof(identity), Nothing}}}}}, init::BangBang.SafeCollector{BangBang.NoBang.Empty{Vector{Union{}}}}, arrays::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer})
    @ FoldsCUDA ~/.julia/packages/FoldsCUDA/Mo35m/src/kernels.jl:32
  [5] _transduce_cuda(op::Function, init::BangBang.SafeCollector{BangBang.NoBang.Empty{Vector{Union{}}}}, xs::Transducers.Eduction{Transducers.Reduction{Transducers.Map{var"#21#22"}, Transducers.BottomRF{Transducers.Completing{typeof(BangBang.push!!)}}}, CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}})
    @ FoldsCUDA ~/.julia/packages/FoldsCUDA/Mo35m/src/kernels.jl:18
  [6] #_transduce_cuda#5
    @ ~/.julia/packages/FoldsCUDA/Mo35m/src/kernels.jl:1 [inlined]
  [7] _transduce_cuda
    @ ~/.julia/packages/FoldsCUDA/Mo35m/src/kernels.jl:1 [inlined]
  [8] transduce
    @ ~/.julia/packages/FoldsCUDA/Mo35m/src/api.jl:45 [inlined]
  [9] collect(itr::Transducers.Eduction{Transducers.Reduction{Transducers.Map{var"#21#22"}, Transducers.BottomRF{Transducers.Completing{typeof(BangBang.push!!)}}}, CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}}, ex::CUDAEx{NamedTuple{(), Tuple{}}})
    @ Folds.Implementations ~/.julia/packages/Folds/ZayPF/src/collect.jl:4
 [10] map(f::Function, itr::CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}, ex::CUDAEx{NamedTuple{(), Tuple{}}})
    @ Folds.Implementations ~/.julia/packages/Folds/ZayPF/src/collect.jl:84
 [11] top-level scope
    @ REPL[14]:1
 [12] top-level scope
    @ ~/.julia/packages/CUDA/BbliS/src/initialization.jl:52

The intention at least of Folds.jl is to combine distributed, multithreaded, and GPU parallelism under one roof.

MasonProtter avatar May 04 '23 22:05 MasonProtter