RCall.jl
RCall.jl copied to clipboard
Treating `Symbol`s like `String`s?
I often use ggplot2 for plotting. I often find myself in the situation, where it doesn't work, because RCall.jl cannot handle Symbols properly.
CategoricalArray{Symbol}(which occur naturally from reshapingstacking dataframes) are not converted into factors.Array{Symbol}is converted to a list of symbols (dunno what that is), which cannot be processed byggplot2.
using RCall, DataFrames
@rlibrary ggplot2
df_string = DataFrame(x = randn(100), y=randn(100), z1=rand(["a"; "b"], 100), z2=rand(["darkorange"; "lightblue"], 100))
ggplot(df_string) + geom_point(aes(x=:x, y=:y, color=:z1)) # works with standard colors
ggplot(df_string) + geom_point(aes(x=:x, y=:y, color=:z2)) # works with standard colors
df_symbol = DataFrame(x = randn(100), y=randn(100), z1=rand([:a; :b], 100), z2=rand([:darkorange; :lightblue], 100))
ggplot(df_symbol) + geom_point(aes(x=:x, y=:y, color=:z1)) # doesn't work
ggplot(df_symbol) + geom_point(aes(x=:x, y=:y, color=:z2)) # uses colors "darkred" and "lightblue"
I don't understand, why R now uses the symbols as colornames, not as factor categories. Is this intended?
categorical!(df_string, :z1)
ggplot(df_string) + geom_point(aes(x=:x, y=:y, color=:z1)) # works with standard colors
categorical!(df_symbol, :z1)
R"$df_symbol" # Error - malformed factor
Would it be possible to convert all Symbols to strings automatically before sending them to R? Could we at least do the conversion automatically for CategoricalArray{Symbol}?
stack doesn't create CategoricalArray{Symbol} columns AFAICT. Do you have an example?
R has the concept of symbol (see e.g. as.symbol), so it sounds appropriate to convert Julia symbols to that. It's unfortunate that symbols in Julia are used in many places where a string would be used in R.
The CategoricalArray issue sounds secondary to me: better decide what to do with Array{Symbol} first, as it is a simpler case.
stack creates Vector{Symbol}.
Sorry, I guess I mixed that up. I guess I meant that Vector{Symbol} arise naturally and they don't work well with RCall.
No problem 😄. In DataFrames.jl we will likely switch to CategoricalVector{String} anyway soon. But the issue with Symbol in Rcall.jl should be probably resolved anyway.