Special case `Not`, `Between`, `Regex` etc. before 1.0?
Should we allow for
@select df Not(:a)
before we go for 1.0?
currently you have to do
@select df $(Not(:a))
which is ugly.
I would guess we should special case Not, Between, Regex, and r"...". But not special case 1, 2, 3... because integers are more likely to conflict with normal code doing analysis.
The thing to consider is whether or not the macros that accept arguments as-is (no :y = f(:x) requirement) like @orderby, @subset, and @with will be affected by this. I don't anticipate any issues.
This can be added post-1.0. But maybe we want to be very polished for that.
This could open up the door to thinking about
vars = Not(:a)
@select df vars
which might be expected to work if @select df Not(:a) works. I don't like the idea of specializing too much on the exact syntax used. If only $Not(:a) was allowed instead of needing $(Not(:a)).
I went though documentation on main and docstrings and my conclusion is that unfortunately all has to be read through to make sure it is up to date (as we have some sentences that are outdated scattered). Unfortunately you are the only one that can do this reliably I am afraid 😢.
Some particular issues:
-
$is typeset as\$in rendered HTML in several places which can confuse users - we sometimes still refer to keyword arguments, see e.g. https://github.com/JuliaData/DataFramesMeta.jl/blob/73bb4c40a41cbc334786044ae820693a40d14915/src/macros.jl#L1299
Finally I missed a very precise specification how expressions passed to macros are parsed. In particular:
- I am missing a very precise rule explaining what
$does (now only examples are given) - I am missing a very precise rule explaining how columns are parsed - I assume that by default literal
Symbols are treated as columns only - is this correct?
Having said these two things I do not understand why things like:
@select df r"x"
do not work, but
@select df $(r"x")
works.
E.g. I do not understand the rule for the following:
julia> @macroexpand @subset df r"x"
quote
(DataFrames.subset)(df, begin
DataFramesMeta.make_source_concrete([]) => begin
()->begin
r"x"
end
end
end; skipmissing = true)
end
julia> @macroexpand @subset df $(r"x")
quote
(DataFrames.subset)(df, r"x"; skipmissing = true)
end
and it was not clear for me how DataFramesMeta.jl decided that it does not need to create an anonymous function.
Okay, I will open up a PR to work on this.
I do not know how to fix the \$ issue. If we do not escape inside the markdown strings, we get errors that certain variables do not exist because it tries to interpolate whatever is after the $. I will explore what other packages do to get around this problem.
With regards to r"x", the rule is here:
https://github.com/JuliaData/DataFramesMeta.jl/blob/73bb4c40a41cbc334786044ae820693a40d14915/src/parsing.jl#L261
This line says that if an argument is just $(...), then it gets passed directly as is. Let's focus on @subset. Arguments to @subset do not need to be in a keyword-argument format, i.e while @transform needs @transform df :y = f(:x), the expressions passed to @subset can be arbitrary.
x = [true, true, false]
@subset df :y .== x
gets the same code path as
@subset df x .== true # @subset df x actually fails due to a bug...
We don't require users to reference any columns at all, so arbitrary expressions are assumed to enter the anonymous function. Let's say we make non-keyword arguments default to column selectors. This works fine in @select because if something isn't a :y = f(:x) expression, we know anything else would throw an error and can assume the user wants to select a column.
But in @subset arbitrary expression are valid, so we can't make that assumption and need users to be explicit about $(...).
I do not know how to fix the $ issue.
have you tried raw strings?
Regarding the rules. What I think would be very good is if you written down all the rules in the documentation somewhere (something just like you have described above, but an exhaustive list). If I had such a list I could more easily check things. What I mean by such a list is a list similar to what we have e.g. for indexing or for transformation minilanguage (this list with 7 cases accepted)
Okay writing this up, two notes so far
- We should allow multi-argument selectors in
@orderby - We should make it easier to write
:a => funinside@with.
Added in #372