RCall.jl icon indicating copy to clipboard operation
RCall.jl copied to clipboard

Treating `Symbol`s like `String`s?

Open greimel opened this issue 6 years ago • 4 comments

I often use ggplot2 for plotting. I often find myself in the situation, where it doesn't work, because RCall.jl cannot handle Symbols properly.

  • CategoricalArray{Symbol} (which occur naturally from reshaping stacking dataframes) are not converted into factors.
  • Array{Symbol} is converted to a list of symbols (dunno what that is), which cannot be processed by ggplot2.
using RCall, DataFrames
@rlibrary ggplot2

df_string = DataFrame(x = randn(100), y=randn(100), z1=rand(["a"; "b"], 100), z2=rand(["darkorange"; "lightblue"], 100))
ggplot(df_string) + geom_point(aes(x=:x, y=:y, color=:z1)) # works with standard colors
ggplot(df_string) + geom_point(aes(x=:x, y=:y, color=:z2)) # works with standard colors
 
df_symbol = DataFrame(x = randn(100), y=randn(100), z1=rand([:a; :b], 100), z2=rand([:darkorange; :lightblue], 100))
ggplot(df_symbol) + geom_point(aes(x=:x, y=:y, color=:z1)) # doesn't work
ggplot(df_symbol) + geom_point(aes(x=:x, y=:y, color=:z2)) # uses colors "darkred" and "lightblue"

I don't understand, why R now uses the symbols as colornames, not as factor categories. Is this intended?

categorical!(df_string, :z1)
ggplot(df_string) + geom_point(aes(x=:x, y=:y, color=:z1)) # works with standard colors

categorical!(df_symbol, :z1)
R"$df_symbol" # Error - malformed factor

Would it be possible to convert all Symbols to strings automatically before sending them to R? Could we at least do the conversion automatically for CategoricalArray{Symbol}?

greimel avatar Aug 24 '19 16:08 greimel

stack doesn't create CategoricalArray{Symbol} columns AFAICT. Do you have an example?

R has the concept of symbol (see e.g. as.symbol), so it sounds appropriate to convert Julia symbols to that. It's unfortunate that symbols in Julia are used in many places where a string would be used in R.

The CategoricalArray issue sounds secondary to me: better decide what to do with Array{Symbol} first, as it is a simpler case.

nalimilan avatar Sep 04 '19 13:09 nalimilan

stack creates Vector{Symbol}.

bkamins avatar Sep 05 '19 13:09 bkamins

Sorry, I guess I mixed that up. I guess I meant that Vector{Symbol} arise naturally and they don't work well with RCall.

greimel avatar Sep 05 '19 13:09 greimel

No problem 😄. In DataFrames.jl we will likely switch to CategoricalVector{String} anyway soon. But the issue with Symbol in Rcall.jl should be probably resolved anyway.

bkamins avatar Sep 05 '19 13:09 bkamins