JuliaDB.jl
JuliaDB.jl copied to clipboard
Error joining to distributed NDSparse
@everywhere using JuliaDB
indicesA = (S=[0.6, 0.7], T=[1,2.0])
indicesB = (S=[1.6, 1.7], T=[2,5.0])
valsAscalar = (u=[1, 2], t=[2, 3])
valsBscalar = (u=[30, 50], t=[4, 5])
Ascalar = ndsparse(indicesA, valsAscalar)
Bscalar = ndsparse(indicesB, valsBscalar)
ddb = distribute(Ascalar, 1)
Cscalar = join(ddb, Bscalar)
gives
ERROR: MethodError: Cannot `convert` an object of type Nullables.Nullable{Array{Pair{Dagger.OSProc,Int64},1}} to an object of type Array{Pair{Dagger.OSProc,Int64},1}
Closest candidates are:
convert(::Type{Array{S,N}}, ::DataValues.DataValueArray{T,N}) where {S, T, N} at /home/grahams/.julia/packages/DataValues/XQWvG/src/array/primitives.jl:272
convert(::Type{Array{S,N}}, ::DataValues.DataValueArray{T,N}, ::Any) where {S, T, N} at /home/grahams/.julia/packages/DataValues/XQWvG/src/array/primitives.jl:301
convert(::Type{Array{S,N}}, ::PooledArrays.PooledArray{T,R,N,RA} where RA) where {S, T, R, N} at /home/grahams/.julia/packages/PooledArrays/ufJSl/src/PooledArrays.jl:288
...
Stacktrace:
[1] convert(::Type{Union{Nothing, Array{Pair{Dagger.OSProc,Int64},1}}}, ::Nullables.Nullable{Array{Pair{Dagger.OSProc,Int64},1}}) at ./some.jl:34
[2] setproperty!(::Dagger.Thunk, ::Symbol, ::Nullables.Nullable{Array{Pair{Dagger.OSProc,Int64},1}}) at ./Base.jl:21
[3] #join#267(::Symbol, ::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}, ::Symbol, ::Int64, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(join), ::typeof(IndexedTables.concat_tup), ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}) at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:54
[4] (::Base.var"#kw##join")(::NamedTuple{(:broadcast, :how),Tuple{Symbol,Symbol}}, ::typeof(join), ::Function, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}) at ./none:0
[5] #join#278(::Symbol, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(join), ::Function, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::NDSparse{NamedTuple{(:u, :t),Tuple{Int64,Int64}},Tuple{Float64,Float64},StructArrays.StructArray{NamedTuple{(:S, :T),Tuple{Float64,Float64}},1,NamedTuple{(:S, :T),Tuple{Array{Float64,1},Array{Float64,1}}},Int64},StructArrays.StructArray{NamedTuple{(:u, :t),Tuple{Int64,Int64}},1,NamedTuple{(:u, :t),Tuple{Array{Int64,1},Array{Int64,1}}},Int64}}) at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:130
[6] (::Base.var"#kw##join")(::NamedTuple{(:how,),Tuple{Symbol}}, ::typeof(join), ::Function, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::NDSparse{NamedTuple{(:u, :t),Tuple{Int64,Int64}},Tuple{Float64,Float64},StructArrays.StructArray{NamedTuple{(:S, :T),Tuple{Float64,Float64}},1,NamedTuple{(:S, :T),Tuple{Array{Float64,1},Array{Float64,1}}},Int64},StructArrays.StructArray{NamedTuple{(:u, :t),Tuple{Int64,Int64}},1,NamedTuple{(:u, :t),Tuple{Array{Int64,1},Array{Int64,1}}},Int64}}) at ./none:0
[7] #join#273 at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:117 [inlined]
[8] join(::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::NDSparse{NamedTuple{(:u, :t),Tuple{Int64,Int64}},Tuple{Float64,Float64},StructArrays.StructArray{NamedTuple{(:S, :T),Tuple{Float64,Float64}},1,NamedTuple{(:S, :T),Tuple{Array{Float64,1},Array{Float64,1}}},Int64},StructArrays.StructArray{NamedTuple{(:u, :t),Tuple{Int64,Int64}},1,NamedTuple{(:u, :t),Tuple{Array{Int64,1},Array{Int64,1}}},Int64}}) at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:116
[9] top-level scope at REPL[21]:1
I tried the naive solution of adding a Base.convert(::Type{T}, x::Nullable{T}) where T = x.value but then the join returns an empty table.
Two issues:
- there seems to be a bug for the inner join for
DNDSparse;NDSparsejoin works and can be tested withjoin(Ascalar, Bscalar) - for a join, ideally the names of the data columns should be different in the two ndsparse (if you do not want
BscalarorAscalardata values to replace one another, in which case it is correct)
The example above works for join(ddb, Bscalar; how=:outer)
EDIT: after your fix, an empty table would be the correct result.
@zgornel Thank you! I can't believe I didn't notice I was doing the wrong kind of join.
I do get the correct result with an outer join, with one caveat: No matter how many tables I join to the original distributed table, the number of chunks remains the same. I guess I'm not sure what the default behavior should be, so this behavior makes sense, even though I didn't expect it. However, I'm not sure how to join to a distributed table and increase the number of chunks, which is important in my use-case because each joined table individually approaches the memory limit of my machine. Does anyone know how to do this? Does it warrant its own issue? At the very least, I would expect documentation on this and I'm happy to provide that if I understand how to do it.
I'll leave this issue open because of the remaining problem that join errors uninformatively when it should return an empty distributed table.