silent error when using `,` in operations
consider the use of @orderby below. Look at the last example. Somehow, the syntax allows the use of , does not error, but provides an incorrect result.
julia> dd = DataFrame(a = [3, 1, 2], b = [5,4, 6])
3×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 3 5
2 │ 1 4
3 │ 2 6
# works correctly
julia> @orderby(dd, :a, :b)
3×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 2 6
3 │ 3 5
# works correctly
julia> @orderby dd :a :b
3×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 4
2 │ 2 6
3 │ 3 5
# works silenty, but incorrect
julia> @orderby dd :a, :b
3×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 3 5
2 │ 1 4
3 │ 2 6
Weird, thanks for the issue report. I would have expected the last one to sort the tuples lexicographically, but it does not seem to do that.
oh wait, it is sorting the tuple lexicographically. But within this it's sorting the arrays lexicographically. This is "right" in the sense that it's behaving as Julia behaves, but it is unintuitive behavior.
MacroTools.@macroexpand(@orderby dd :a, :b) |> MacroTools.prettify
:((DataFramesMeta).orderby(dd, DataFramesMeta.make_source_concrete([:b, :a]) => (((bat, falcon)->(falcon, bat)) => Symbol("##257"))))
with
function orderby(x::SubDataFrame, @nospecialize(args...))
t = DataFrames.select(x, args...)
x[sortperm(t), :]
end
so we see
julia> select(dd, DataFramesMeta.make_source_concrete([:b, :a]) => (((bat, falcon)->(falcon, bat)) => Symbol("##257")))
3×1 DataFrame
Row │ ##257
│ Tuple…
─────┼────────────────────────
1 │ ([3, 1, 2], [5, 4, 6])
2 │ ([3, 1, 2], [5, 4, 6])
3 │ ([3, 1, 2], [5, 4, 6])
Sorry, i'll finish writing this up tomorrow
Okay I'm back.
Here's whats going on. The transformation
([:b, :a]) => ((bat, falcon)->(falcon, bat))
creates a tuple of vectors. Because a tuple of vectors is functionally a scalar in DataFrames.jl's transformation language (i.e. not a table or a vector), it gets "spread" across rows, the same way
julia> df = DataFrame(a = [1, 2]);
julia> transform(df, :a => (t -> 1) => :b)
2×2 DataFrame
Row │ a b
│ Int64 Int64
─────┼──────────────
1 │ 1 1
2 │ 2 1
works.
So it's creating the data frame
3×1 DataFrame
Row │ ##257
│ Tuple…
─────┼────────────────────────
1 │ ([3, 1, 2], [5, 4, 6])
2 │ ([3, 1, 2], [5, 4, 6])
3 │ ([3, 1, 2], [5, 4, 6])
and then sorting by the ##257 column. Which means the order doesn't change (or is undefined ?) after the sort happens.
I agree that it's very confusing that @orderby df :a, :b looks so similar to @orderby(df, :a, :b).
One potential solution is to disallow expressions that look like Expr(:tuple, ...) to avoid this problem... but I'm worried that could have unforseen side effects.
Now that you understand what DataFramesMeta.jl is doing behind the scenes, do you have more thoughts on how to approach this issue?
So in essence:
@orderby df :a, :b
is the same as
@orderby df tuple(:a, :b)
due to how Julia parser works.
Right?
Yes, exactly.
I'm not sure what the correct solution here would be. Maybe we document the use of tuple in such cases, which is essentially equivalent to writing the syntax with the parenthesis. So maybe just Advocate the use of one common syntax rather than two different ways of writing? Given a choice, to be consistent with dataframes.jl, it is probably easier to write the macros with parentheses and start deprecating the one without. It will be consistent also with the syntax that is used in R where most of the users are coming from
it is probably easier to write the macros with parentheses and start deprecating the one without
This is what I do (i.e. I always use a function call style). But we cannot "deprecate it" - this is an intentional feature of Julia not of DataFramesMeta.jl. Probably we should highlight to users that there are two styles allowed by Julia and explained them better.
So maybe just Advocate the use of one common syntax rather than two different ways of writing?
Yes, this doesn't fully solve the problem, since @mymacro a, b is always going to be parsed as a tuple by default. Even if the recommended way is @mymacro(a, b).
it is probably easier to write the macros with parentheses and start deprecating the one without.
The reason we support the non-parentheses version is because of @passmissing and other macro flags. Ironically, for the same problems that we are writing out here.
@rorderby(df, @passmissing :y, :b)
will get parsed as
@rorderby(df, @passmissing(tuple(:y, :b))
the solution to this problem was to allow multiple lines so the macro-flags could get properly parsed
@rorderby df begin
@passmissing :y
:b
end
It will be consistent also with the syntax that is used in R where most of the users are coming from
Yes but @rtransform df :a or inside a chain block is closer to Stata, which is also a market-share to appeal to. Plus, I dislike the requirement of parentheses in dplyr, because I think they discourage complicated transformations. I was actually hoping to get rid of all the parentheses in documentation to get people to use the begin ... end version.
I think the way forward is to disallow @rtransform :a, :b so users dont get confused.
I think the way forward is to disallow @rtransform :a, :b so users dont get confused.
I'm not sure that helps, as you can of course disallow a tuple, but you would also need to disallow macros that work on tuples. This is because @rtransform @byrow something, other_thing would resolve to one macro expression as an argument, and I'm not sure that you don't get into trouble if you also disallow tuples within a macro as well.
One solution would be to disallow multiple arguments except listed in a begin end block, but that feels overly restrictive.
The premise of this whole issue is that there is a silent "error", but of course it's not that, it's just confusing syntax. But given that Julia's macro parsing rules are fixed and you can't discriminate between a call with or without parentheses inside the macro, I think this is just not avoidable.
But given that Julia's macro parsing rules are fixed and you can't discriminate between a call with or without parentheses inside the macro
This was my point above. I think it is best to very clearly explain the options and their consequences in the manual. @pdeffebach - if you prefer I can submit a PR.
Maybe we need a "Gotcha's" section of the docs? Could be a good opportunity to beef up the docs before 1.0.