JuliaDB.jl icon indicating copy to clipboard operation
JuliaDB.jl copied to clipboard

Error joining to distributed NDSparse

Open grahamas opened this issue 6 years ago • 2 comments
trafficstars

@everywhere using JuliaDB
  
indicesA = (S=[0.6, 0.7], T=[1,2.0])
indicesB = (S=[1.6, 1.7], T=[2,5.0])
valsAscalar = (u=[1, 2], t=[2, 3])
valsBscalar = (u=[30, 50], t=[4, 5])

Ascalar = ndsparse(indicesA, valsAscalar)
Bscalar = ndsparse(indicesB, valsBscalar)
ddb = distribute(Ascalar, 1)
Cscalar = join(ddb, Bscalar)

gives

ERROR: MethodError: Cannot `convert` an object of type Nullables.Nullable{Array{Pair{Dagger.OSProc,Int64},1}} to an object of type Array{Pair{Dagger.OSProc,Int64},1}
Closest candidates are:
  convert(::Type{Array{S,N}}, ::DataValues.DataValueArray{T,N}) where {S, T, N} at /home/grahams/.julia/packages/DataValues/XQWvG/src/array/primitives.jl:272
  convert(::Type{Array{S,N}}, ::DataValues.DataValueArray{T,N}, ::Any) where {S, T, N} at /home/grahams/.julia/packages/DataValues/XQWvG/src/array/primitives.jl:301
  convert(::Type{Array{S,N}}, ::PooledArrays.PooledArray{T,R,N,RA} where RA) where {S, T, R, N} at /home/grahams/.julia/packages/PooledArrays/ufJSl/src/PooledArrays.jl:288
  ...
Stacktrace:
 [1] convert(::Type{Union{Nothing, Array{Pair{Dagger.OSProc,Int64},1}}}, ::Nullables.Nullable{Array{Pair{Dagger.OSProc,Int64},1}}) at ./some.jl:34
 [2] setproperty!(::Dagger.Thunk, ::Symbol, ::Nullables.Nullable{Array{Pair{Dagger.OSProc,Int64},1}}) at ./Base.jl:21
 [3] #join#267(::Symbol, ::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}, ::Symbol, ::Int64, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(join), ::typeof(IndexedTables.concat_tup), ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}) at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:54
 [4] (::Base.var"#kw##join")(::NamedTuple{(:broadcast, :how),Tuple{Symbol,Symbol}}, ::typeof(join), ::Function, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}) at ./none:0
 [5] #join#278(::Symbol, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(join), ::Function, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::NDSparse{NamedTuple{(:u, :t),Tuple{Int64,Int64}},Tuple{Float64,Float64},StructArrays.StructArray{NamedTuple{(:S, :T),Tuple{Float64,Float64}},1,NamedTuple{(:S, :T),Tuple{Array{Float64,1},Array{Float64,1}}},Int64},StructArrays.StructArray{NamedTuple{(:u, :t),Tuple{Int64,Int64}},1,NamedTuple{(:u, :t),Tuple{Array{Int64,1},Array{Int64,1}}},Int64}}) at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:130
 [6] (::Base.var"#kw##join")(::NamedTuple{(:how,),Tuple{Symbol}}, ::typeof(join), ::Function, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::NDSparse{NamedTuple{(:u, :t),Tuple{Int64,Int64}},Tuple{Float64,Float64},StructArrays.StructArray{NamedTuple{(:S, :T),Tuple{Float64,Float64}},1,NamedTuple{(:S, :T),Tuple{Array{Float64,1},Array{Float64,1}}},Int64},StructArrays.StructArray{NamedTuple{(:u, :t),Tuple{Int64,Int64}},1,NamedTuple{(:u, :t),Tuple{Array{Int64,1},Array{Int64,1}}},Int64}}) at ./none:0
 [7] #join#273 at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:117 [inlined]
 [8] join(::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::NDSparse{NamedTuple{(:u, :t),Tuple{Int64,Int64}},Tuple{Float64,Float64},StructArrays.StructArray{NamedTuple{(:S, :T),Tuple{Float64,Float64}},1,NamedTuple{(:S, :T),Tuple{Array{Float64,1},Array{Float64,1}}},Int64},StructArrays.StructArray{NamedTuple{(:u, :t),Tuple{Int64,Int64}},1,NamedTuple{(:u, :t),Tuple{Array{Int64,1},Array{Int64,1}}},Int64}}) at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:116
 [9] top-level scope at REPL[21]:1

I tried the naive solution of adding a Base.convert(::Type{T}, x::Nullable{T}) where T = x.value but then the join returns an empty table.

grahamas avatar Oct 16 '19 16:10 grahamas

Two issues:

  • there seems to be a bug for the inner join for DNDSparse; NDSparse join works and can be tested with join(Ascalar, Bscalar)
  • for a join, ideally the names of the data columns should be different in the two ndsparse (if you do not want Bscalar or Ascalar data values to replace one another, in which case it is correct)

The example above works for join(ddb, Bscalar; how=:outer) EDIT: after your fix, an empty table would be the correct result.

zgornel avatar Oct 25 '19 20:10 zgornel

@zgornel Thank you! I can't believe I didn't notice I was doing the wrong kind of join.

I do get the correct result with an outer join, with one caveat: No matter how many tables I join to the original distributed table, the number of chunks remains the same. I guess I'm not sure what the default behavior should be, so this behavior makes sense, even though I didn't expect it. However, I'm not sure how to join to a distributed table and increase the number of chunks, which is important in my use-case because each joined table individually approaches the memory limit of my machine. Does anyone know how to do this? Does it warrant its own issue? At the very least, I would expect documentation on this and I'm happy to provide that if I understand how to do it.

I'll leave this issue open because of the remaining problem that join errors uninformatively when it should return an empty distributed table.

grahamas avatar Nov 21 '19 17:11 grahamas