TypedTables.jl
TypedTables.jl copied to clipboard
Issue with Tables.getcolumn by index
Accessing columns through Tables.getcolumn(table, name::Symbol)
works as expected, but using Tables.getcolumn(table, ind::Int)
does not.
Setup:
using Tables, TypedTables
table = Table(a=rand(300), b=rand(300))
table_nt = (;a=rand(300), b=rand(300))
Expected behaviour:
Tables.getcolumn(table_nt, 2)
300-element Vector{Float64}:
0.7419591651104771
0.03643357962428917
0.511973946658012
0.7525280472737248
0.5312671306022833
...
This works with simple named tuples of vectors, as well as DataFrames.
Observed behaviour:
julia> Tables.getcolumn(table, 2)
ERROR: BoundsError: attempt to access 300-element Table{NamedTuple{(:a, :b), Tuple{Float64, Float64}}, 1, NamedTuple{(:a, :b), Tuple{Vector{Float64}, Vector{Float64}}}} at index [2]
Stacktrace:
[1] getcolumn(x::Table{NamedTuple{(:a, :b), Tuple{Float64, Float64}}, 1, NamedTuple{(:a, :b), Tuple{Vector{Float64}, Vector{Float64}}}}, i::Int64)
@ Tables C:\Users\William\.julia\packages\Tables\i6z2B\src\Tables.jl:101
[2] top-level scope
@ REPL[65]:1
However, using index 1 returns all columns which is not useful:
julia> Tables.getcolumn(table, 1)
(a = [0.7736170160574704, 0.32973335588180575, 0.17889965718253964, 0.7631323090473862, 0.7800224219389631, 0.08040930668634005, 0.9557133954558753, 0.9979396219551491, 0.15894660237894975, 0.5680381167378448 … 0.6559116874983786, 0.7328418210533515, 0.4856581423782824, 0.33251283450523117, 0.08142486970852292, 0.2259648695642409, 0.39396960265088865, 0.7031534405558856, 0.10224220322748001, 0.14191199646807617], b = [0.017236706415861724, 0.5265418832740683, 0.4268344997706731, 0.46470458360887146, 0.8360733105726028, 0.6032125887699785, 0.9385924928402325, 0.7405311692330161, 0.4201266483743147, 0.9833490878965103 … 0.14241236909936195, 0.29289242214548683, 0.8408873927907317, 0.7439831490645507, 0.6205302905751314, 0.9686022965164416, 0.8139530289474524, 0.823492626767103, 0.04273546220284152, 0.44406075204392326])
Accessing by column name :a
or :b
works as expected.
Thanks!
@quinnj any advice on this one?
In the official "usage" of the Tables.jl interface, you're only guaranteed to be able to call Tables.getcolumn
on either: 1) the object returned from Tables.columns(x)
, or 2) on each iterated element of the object returned by Tables.rows(x)
. For DataFrames.jl/NamedTuple of vectors, the objects themselves happen to get returned from Tables.columns
, but in the case of Table
, it's not. So if you do tbl = Tables.columns(table)
first, you can get expect to call Tables.getcolumn
on the result.
I see.
Is it good practice to extend some of these methods and opt into common behaviour? Or is it preferable to let users use the columns
function?
All up to you; users of the Tables.jl API just need to make sure they follow the guidelines, which admittedly aren't the absolute most convenient form, but are really meant for "sink" authors in the end.