using `mean` with distributed Dagger arrays
It would be great to have functions from Statistics useable with distributed Dagger arrays. Currenly the function sum does work but the equivalent mean function produces the error below:
using Dagger
using Statistics
sz = (2,2,3)
X = randn(sz)
DX = Distribute(Blocks(sz[1],sz[2],1), X)
# works
@show sum(DX,dims=3)
# fails
@show mean(DX,dims=3)
I am using Dagger v0.13.1 in Julia v1.6.2.
By looking at the code of Statistics.jl it seems that a specific implementation of
mapreduce(f,g,d::Dagger.Distribute,dims=dims) would be needed. There is already one without the dims parameter. Would a PR be helpful?
Full stack-trace:
ERROR: LoadError: MethodError: no method matching +(::Float64, ::Dagger.GetIndex{Float64, 3})
For element-wise addition, use broadcasting with dot syntax: scalar .+ array
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
+(::Union{Float16, Float32, Float64}, ::BigFloat) at mpfr.jl:392
+(::AbstractFloat, ::Bool) at bool.jl:102
...
Stacktrace:
[1] add_sum(x::Float64, y::Dagger.GetIndex{Float64, 3})
@ Base ./reduce.jl:24
[2] macro expansion
@ ./reducedim.jl:270 [inlined]
[3] macro expansion
@ ./simdloop.jl:77 [inlined]
[4] _mapreducedim!(f::Statistics.var"#4#6"{typeof(identity), Dagger.BCast{Base.Broadcast.Broadcasted{Dagger.DaggerBroadcastStyle, Tuple{Base.OneTo{Int64}}, typeof(/), Tuple{Dagger.GetIndex{Float64, 3}, Int64}}, Float64, 1}}, op::typeof(Base.add_sum), R::Array{Float64, 3}, A::Distribute{Float64, 3})
@ Base ./reducedim.jl:269
[5] mapreducedim!(f::Function, op::Function, R::Array{Float64, 3}, A::Distribute{Float64, 3})
@ Base ./reducedim.jl:277
[6] _mapreduce_dim(f::Function, op::Function, #unused#::Base._InitialValue, A::Distribute{Float64, 3}, dims::Int64)
@ Base ./reducedim.jl:324
[7] #mapreduce#672
@ ./reducedim.jl:310 [inlined]
[8] #_sum#702
@ ./reducedim.jl:900 [inlined]
[9] _sum
@ ./reducedim.jl:900 [inlined]
[10] #sum#680
@ ./reducedim.jl:874 [inlined]
[11] _mean(f::typeof(identity), A::Distribute{Float64, 3}, dims::Int64)
@ Statistics /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Statistics/src/Statistics.jl:177
[12] #mean#2
@ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Statistics/src/Statistics.jl:164 [inlined]
[13] top-level scope
@ show.jl:955
in expression starting at /home/abarth/.julia/dev/NCDatasets/test/test_dagger.jl:46
Yeah, the lazy nature of the DArray breaks under a lot of regular array operations. Per #226, we'll probably just want to make the DArray eager (like the DTable), potentially by reusing parts of DistributedArrays.jl.