TypedTables.jl icon indicating copy to clipboard operation
TypedTables.jl copied to clipboard

Issue with Tables.getcolumn by index

Open sefffal opened this issue 3 years ago • 4 comments

Accessing columns through Tables.getcolumn(table, name::Symbol) works as expected, but using Tables.getcolumn(table, ind::Int) does not.

Setup:

using Tables, TypedTables
table = Table(a=rand(300), b=rand(300))
table_nt = (;a=rand(300), b=rand(300))

Expected behaviour:

Tables.getcolumn(table_nt, 2)
300-element Vector{Float64}:
 0.7419591651104771
 0.03643357962428917
 0.511973946658012
 0.7525280472737248
 0.5312671306022833
...

This works with simple named tuples of vectors, as well as DataFrames.

Observed behaviour:

julia> Tables.getcolumn(table, 2)
ERROR: BoundsError: attempt to access 300-element Table{NamedTuple{(:a, :b), Tuple{Float64, Float64}}, 1, NamedTuple{(:a, :b), Tuple{Vector{Float64}, Vector{Float64}}}} at index [2]
Stacktrace:
 [1] getcolumn(x::Table{NamedTuple{(:a, :b), Tuple{Float64, Float64}}, 1, NamedTuple{(:a, :b), Tuple{Vector{Float64}, Vector{Float64}}}}, i::Int64)
   @ Tables C:\Users\William\.julia\packages\Tables\i6z2B\src\Tables.jl:101
 [2] top-level scope
   @ REPL[65]:1

However, using index 1 returns all columns which is not useful:

julia> Tables.getcolumn(table, 1)
(a = [0.7736170160574704, 0.32973335588180575, 0.17889965718253964, 0.7631323090473862, 0.7800224219389631, 0.08040930668634005, 0.9557133954558753, 0.9979396219551491, 0.15894660237894975, 0.5680381167378448  …  0.6559116874983786, 0.7328418210533515, 0.4856581423782824, 0.33251283450523117, 0.08142486970852292, 0.2259648695642409, 0.39396960265088865, 0.7031534405558856, 0.10224220322748001, 0.14191199646807617], b = [0.017236706415861724, 0.5265418832740683, 0.4268344997706731, 0.46470458360887146, 0.8360733105726028, 0.6032125887699785, 0.9385924928402325, 0.7405311692330161, 0.4201266483743147, 0.9833490878965103  …  0.14241236909936195, 0.29289242214548683, 0.8408873927907317, 0.7439831490645507, 0.6205302905751314, 0.9686022965164416, 0.8139530289474524, 0.823492626767103, 0.04273546220284152, 0.44406075204392326])

Accessing by column name :a or :b works as expected.

Thanks!

sefffal avatar Nov 02 '21 22:11 sefffal

@quinnj any advice on this one?

andyferris avatar Nov 03 '21 00:11 andyferris

In the official "usage" of the Tables.jl interface, you're only guaranteed to be able to call Tables.getcolumn on either: 1) the object returned from Tables.columns(x), or 2) on each iterated element of the object returned by Tables.rows(x). For DataFrames.jl/NamedTuple of vectors, the objects themselves happen to get returned from Tables.columns, but in the case of Table, it's not. So if you do tbl = Tables.columns(table) first, you can get expect to call Tables.getcolumn on the result.

quinnj avatar Nov 03 '21 02:11 quinnj

I see.

Is it good practice to extend some of these methods and opt into common behaviour? Or is it preferable to let users use the columns function?

andyferris avatar Nov 03 '21 04:11 andyferris

All up to you; users of the Tables.jl API just need to make sure they follow the guidelines, which admittedly aren't the absolute most convenient form, but are really meant for "sink" authors in the end.

quinnj avatar Nov 03 '21 04:11 quinnj