TableOperations.jl
TableOperations.jl copied to clipboard
Add method to horizontally concatenate two (or more) tables of possibly different type
This has come up in ML workflows; see https://github.com/alan-turing-institute/MLJ.jl/issues/915. Would TableOperations.jl be the appropriate place for this?
What I have in mind is a simple concatenation - not a fancy join. So, if a column name of table1 appears in table2, then the table2 column just gets added with its name modified.
The tricky part is deciding on what the return type should be. I don't have fixed ideas about this, but perhaps if the tables do have the same type, and that is a sink type, then that is also the return type.
Although it is not a part of the public API, I see that TableTransforms.jl has an implementation. (To get the final table, the type of the first table is materialized.):
julia> table1
3×2 DataFrame
Row │ x z
│ Char Float64
─────┼───────────────────
1 │ 𘂯 0.673471
2 │ \U3f846 0.360792
3 │ \Ud50cb 0.68075
julia> table2
(x = [0.41754294943943493, 0.7713462387833814, 0.9189998773436003], y = ['\U84fa1', '\U5e144', '\U872a4'])
julia> TableTransforms.tablehcat([table1, table2])
3×4 DataFrame
Row │ x z x_ y
│ Char Float64 Float64 Char
─────┼──────────────────────────────────────
1 │ 𘂯 0.673471 0.417543 \U84fa1
2 │ \U3f846 0.360792 0.771346 \U5e144
3 │ \Ud50cb 0.68075 0.919 \U872a4
cc @ExpandingMan @juliohm
The public API in TableTransforms.jl that calls this internal function is the "union" operation:
julia> using TableTransforms
julia> t = (a=rand(10), b=rand(10))
(a = [0.6641423369533678, 0.8825002137646382, 0.09547491191702095, 0.08300878853401705, 0.7676971362813552, 0.9581504696200156, 0.567855526779016, 0.07678981780869187, 0.5567635903628834, 0.08891189315634984], b = [0.43196071275199466, 0.06530159708719874, 0.5762803761469641, 0.06943147111497461, 0.7115173288150275, 0.5255875672459875, 0.9022021113163965, 0.4923613837755302, 0.4019291861614135, 0.4260936690192283])
julia> t |> (Select(:a,:b) ⊔ Select(:a,:b))
(a = [0.6641423369533678, 0.8825002137646382, 0.09547491191702095, 0.08300878853401705, 0.7676971362813552, 0.9581504696200156, 0.567855526779016, 0.07678981780869187, 0.5567635903628834, 0.08891189315634984], b = [0.43196071275199466, 0.06530159708719874, 0.5762803761469641, 0.06943147111497461, 0.7115173288150275, 0.5255875672459875, 0.9022021113163965, 0.4923613837755302, 0.4019291861614135, 0.4260936690192283], a_ = [0.6641423369533678, 0.8825002137646382, 0.09547491191702095, 0.08300878853401705, 0.7676971362813552, 0.9581504696200156, 0.567855526779016, 0.07678981780869187, 0.5567635903628834, 0.08891189315634984], b_ = [0.43196071275199466, 0.06530159708719874, 0.5762803761469641, 0.06943147111497461, 0.7115173288150275, 0.5255875672459875, 0.9022021113163965, 0.4923613837755302, 0.4019291861614135, 0.4260936690192283])
We used to depend on TableOperations.jl for the lazy Select, but recently we added a lazy select in TableTransforms.jl directly.