DataAPI.jl icon indicating copy to clipboard operation
DataAPI.jl copied to clipboard

Column selectors should guarantee column order is preserved

Open CameronBieganek opened this issue 4 years ago • 3 comments

I think that all column selectors (other than arrays) should guarantee that the column order in the original table is preserved. One would certainly expect that to be the case for Between, though it's not explicitly mentioned in the docstring. It would be a bummer if you had foo(x, y) = 2x .+ y but Between(:x, :y) => foo happened to lower to [:y, :x] => foo instead of [:x, :y] => foo.

And I think it makes sense to guarantee column order preservation for the other selectors. E.g.

df = DataFrame(a=1, b=2, c=3)
select(df, Not(:b) => foo)

should be guaranteed to lower to

select(df, [:a, :c] => foo)

rather than

select(df, [:c, :a] => foo)

I'm not totally certain the best way to specify the column ordering properties of Cols, but I think this specification makes sense:

  • Individual column selectors inside Cols are first lowered to (ordered) arrays.
    • The lowering of the individual column selectors (except for arrays) follows the rule above that table column order should be preserved.
  • Cols is then lowered as follows: Cols(A, B, C) ==> [A, B\A, C\(A ∪ B)] (where the arguments on the right side are splatted into the array).

Since setdiff on arrays preserves the order of the first argument to setdiff, we get the following behavior:

df = DataFrame(a=1, b=2, c=3)
Cols([:c, :b], [:a, :b]) == [:c, :b, :a]
Cols(r"[bc]", r"[ab]") == [:b, :c, :a]

CameronBieganek avatar Apr 28 '20 19:04 CameronBieganek

What you propose is exactly how it is implemented in DataFrames.jl (unless I made a bug in the code).

Essentially the rule can be stated that: column selectors are evaluated left to right and if a duplicate is encountered it is ignored.

bkamins avatar Apr 28 '20 21:04 bkamins

What you propose is exactly how it is implemented in DataFrames.jl

Agreed, that is how it is currently implemented in DataFrames.jl. I just thought it might be a good idea to make the order guarantee explicit in DataAPI.jl.

CameronBieganek avatar Apr 28 '20 21:04 CameronBieganek

Going back to this issue - would you care to make a PR implementing the proposed changes? Thank you!

bkamins avatar Sep 04 '21 21:09 bkamins