[BUG] ArgumentError: malformed expression in formula
RCall fails to handle valid R expressions
julia> using RCall
julia> reval("library(tidyverse)")
julia> rcopy(reval("aes(x, y)"))
ERROR: LoadError: ArgumentError: malformed expression in formula ~x
Stacktrace:
[1] var"@formula"(__source__::LineNumberNode, __module__::Module, ex::Any)
@ StatsModels ~/.julia/packages/StatsModels/Wzvuu/src/formula.jl:62
[2] eval
@ ./boot.jl:370 [inlined]
[3] rcopy(#unused#::Type{StatsModels.FormulaTerm}, l::Ptr{LangSxp})
@ RCall ~/.julia/packages/RCall/gOwEW/src/convert/formula.jl:41
[4] rcopy(s::Ptr{LangSxp}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ RCall ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:14
[5] rcopy
@ ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:8 [inlined]
[6] rcopy(#unused#::Type{Any}, s::Ptr{LangSxp})
@ RCall ~/.julia/packages/RCall/gOwEW/src/convert/base.jl:21
[7] rcopy(::Type{OrderedCollections.OrderedDict{Symbol, Any}}, s::Ptr{VecSxp}; normalizenames::Bool)
@ RCall ~/.julia/packages/RCall/gOwEW/src/convert/base.jl:174
[8] rcopy(::Type{OrderedCollections.OrderedDict{Symbol, Any}}, s::Ptr{VecSxp})
@ RCall ~/.julia/packages/RCall/gOwEW/src/convert/base.jl:165
[9] rcopy(s::Ptr{VecSxp}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ RCall ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:18
[10] rcopy
@ ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:8 [inlined]
[11] #rcopy#16
@ ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:6 [inlined]
[12] rcopy(r::RObject{VecSxp})
@ RCall ~/.julia/packages/RCall/gOwEW/src/convert/default.jl:6
[13] top-level scope
@ REPL[50]:1
in expression starting at /home/ssahm/.julia/packages/RCall/gOwEW/src/convert/formula.jl:41
The StatsModels.@formula is apparently not meant to be used with simple variables
I am currently using the following workaround/fix
function RCall.rcopy(::Type{RCall.FormulaTerm}, l::Ptr{RCall.LangSxp})
expr = RCall.rcopy(Expr, l)
if Meta.isexpr(expr, :call) && length(expr.args) == 2 && expr.args[1] == :~
# special case of simple variable, like in aes(x, y)
return expr
end
# complex formular
return @eval RCall StatsModels.@formula($expr)
end
I don't know if the example here is artificial, but I'm not sure I think it makes sense to try to copy the result of aes(x, y) into Julia. Here's my reasoning and guess what I thinking is going wrong:
-
aes(x, y)is an example of non-standard evaluation in R -- thexand theyare treated as symbols and not as variables and then those symbols are evaluated in the context of the dataframe you pass into the rest of the ggplot2 call. Julia doesn't have non-standard evaluation -- and intentionally so. Non-standard evaluation is really great for some bits of fun syntax (like the tidyverse uses extensively), but it's very hard for humans and compilers to reason about and thus very hard or even impossible to optimize. This is part of why efforts to add a bytecode compiler to R have had very limited success.[1] In other words, the R expressionaes(x, y)has no direct Julia analog and not just becauseaesis an R function and not a Julia one. - Julia relies instead on macros to do syntax rewriting and thus implement things like the Wilkinson-Roger notation, i.e. the formula syntax.
- If I recall correctly (it's been a while since I messed with ggplot2 internals and much has changed in the mean time),
aesand the like are actually rewritten into a mix of formula notation and things likeaes_which doesn't use non-standard evaluation. My guess is that when this happens, thenxgets turned into the one-sided formula~ x. - RCall sees this formula and says "aha, I know how to translate a formula!" and calls into StatsModels, which has the canonical Julia implementation of the Wilkinson-Roger notation via its
@formulamacro. The only problem is that there are no one-sided formulae in this implementation. I haven't talked to @kleinschmidt to know for sure why, but my guess is that this partly related to- there are other ways to construct individual terms in Julia
- there are other ways in Julia to do the types that R uses one-sided formulae for
- macros can do syntactic rewriting, but the original input still has to be valid Julia syntax, even if it's not "semantically" correct because Julia parses the expression before the macro gets to manipulate it.
- If you really need a one-sided formula in Julia, then you do something like
@formula(0 ~ x) - Now that we've covered why your example doesn't work, I'm not sure it's a good idea to try it. I can't see how
aes(x, y)is useful in Julia -- it's an entity that's meant to be consumed by ggplot2's functions and, as far as I know, there are no functions in Julia that can consume it. So if you just need a reference to theaes-entity to later pass it back into R, then you don't need to callrcopy-- you can just doaes = reval("aes(x, y)")and you'll have a Julia reference to the object in R.
If there's some cool use case I'm missing, please let me know! Then I could provide more guidance. :smile:
One final "nit" -- for the example you're using here, you don't need the whole tidyverse, just ggplot2. Trimming the dependency stack can really help track down a problem, so just FYI. :heart:
Very impressive detailed answer. Indeed there is a usecase: I am in the process of supporting R in Pluto via RCall. Quite a special usecase, but of course in such a generic "execute some R code via RCall" setting, these cases just happen.
In other words: why should rcopy be left to fail in some known (or unkown) cases? Better let's make it valid in all cases.