JuliaCall
JuliaCall copied to clipboard
drop RCall dependencies and more refactoring
In the current implementation of JuliaCall, RCall.jl is required for converting objects between R and Julia environments. While it may be sufficient for most operations, it does however lead to some performance bottlenecks and creates difficulties in maintenance. For instance, any upgrades in RCall may unintentionally break JuliaCall. I reckon that we should drop RCall dependencies and implement our own conversion procedure. It allows us
- to significantly improve setup time
- to easily maintain as the implement is independent to RCall (of course, we want to make sure JuliaCall is compatible with RCall).
The conversion code could be written in a similar manner as
https://github.com/armgong/rjulia. rjulia requires an installation of Julia in the build time. It could be potentially difficult for non-techinical users and it also makes CRAN submission difficult. Via my PR #22, the shared library is loaded at runtime and no julia instllation is needed. As a result, users could download a binary version from CRAN directly.
It could be a major refactoring. Instead of working on the current version of JuliaCall, I suggest we should start a new organization, hypothetically juliaverse
At ~~RJuliaCall~~ juliaverse, we split the JuliaCall into sub-packages
-
juliaapithis package basically just expose the julia C api and allows developers to manipulate julia. It will be basically what I have done in #22 -
juliabasethis package depends onjuliaapiand supports conversions of julia base types. For example, conversions between juliaArray{Float64}and Rnumeric(). -
juliadataIt supports the packages at https://github.com/JuliaData -
juliaverseIt is a mega package containing all the above packages + additional supports, e.g. jupyter. (I personally prefer lower-cased package name and it seems it is also the common practice nowadays to use lower-cased package name)
@Non-Contradiction what do you think about the plan? I also want to bring in @armgong since we will probably use some code in rjulia.
Very interesting proposal! I'm a contributor over at rjulia. (I refactored quite a bit of the rjulia R <-> julia C code a while back.) Perhaps I can provide some insight. @armgong actually did some work on rjulia2, which uses RCall and I've been looking at switching to JuliaCall. Keeping up with changes in DataFrames, AxisArrays, factors and such is a nontrivial amount of work. You get that for free using RCall (assuming you are OK with being on their timeline). Also, the custom conversion between R and julia types enabled by RCall is a really nice feature. I use them for fancy R types that I've rewritten for julia.
Keeping all of that in mind, using rjulia's code for juliaapi and juliabase sounds very interesting. I'm interested in contributing if I can be of help. I'd stick with RCall for juliadata for the above reasons. Also, rjulia's data.frame and factor code has a fair bit of jl_eval_string in it, so it isn't especially quick.
Performance bottleneck is important, but it is not that important. I think the user should expect some performance loss in the interface itself, like starting time and type conversion... If the computation process is heavy, the user could still see the performance gain even though the performance loss in interface; if the computation process is negligible, why does the user uses julia through JuliaCall?
BTW, I was once thinking about having another more lower level interface besides the julia_do_call as to allow users to directly deal with the function's arguments as SEXP (the user already has the choice to construct the return result by using R's C api in RCall), but after more thinking, I think the current approach is more safe and concise despite the performance loss.
@phaverty Thanks for your words (actually I am one of the main maintainers of RCall). Although being a developer of RCall, I do find that it is a bit awkward to use RCall to do some of the basic conversions. For example, sometimes, I just need a function from julia base. My plan is to keep juliabase and juliaapi free from any dependencies so if a user only need very basic conversions, they can just library(juliabase).
For higher level conversions such as DataFrames and AxisArrays, I do agree with you that it is less bitter to use RCall.
As to the maintenance difficulty, I'm sure there is a lot.
For example, the newest RCall requires R >= 3.4.0, but as an R package, I think it's reasonable (?) to require R >= 3.2.0, but to require R >= 3.4.0 may be too much (?). So I add some code (temporarily ?) to restrict the RCall version when the user's R is of older version. This is a quite dirty fix, and it could lead to many potential problems....
And there will be much more problems than the problem I just mentioned...
But if we want to maintain the compatibility between RCall and JuliaCall, for example, the type conversion back and forth between R and Julia, we have to do maintenance work, and the most direct and easy way to achieve that is reusing RCall's code (at least part of the code) in JuliaCall.
There are multiple reasons why RCall.jl requires 3.4, most of them are due to bugs in previous versions of R. It should not be that bad for Julia users because Julia users are supposed to be not afraid of updates (I think).
The design of JuliaCall at first place is to make it easy to maintain, use more Julia code and less C code. And since function is the core concept in both in Julia and R, JuliaCall only provides one main wrappers for function call, others are all based on the function call interface. I hope this could provide a quite consistent interface (the error handling and etc) and make JuliaCall easier to maintain than having to deal with many many functions in Julia lib. (It is kind of similar to the .Call in R.) And another thing behind JuliaCall's design is to be flexible enough, although this comes at the price like performance loss.
After all, we have to maintain some kind of compatibility between RCall and JuliaCall. And explicit is always better than implicit, I think having the dependency is better than reimplementing similar things in two places and hoping them behave consistently.
My idea is that we could restrict the dependency somehow.
For example, we could remove some of the unnecessary usage of RCall in JuliaCall and restrict RCall's usage into several functions and files. And I hope there is a way for JuliaCall to depend on only part of RCall's functionality. Maybe refactor RCall into RCall_base and RCall? Or is there some way to only use part of the package in Julia?
I could think of three aspects that we need to pay attention to, the type conversion, IO and error handling. If all these three are okay, then I see no problem in using RCall (or RCall_base) as an dependency for JuliaCall.
Actually, I am recently working on a python package rapi which is a python port of RCall.jl. I want to make rapi works with reticulate particularly. I learned several extra things in the process.
For instance, JuliaCall converts objects between R and Julia by utilizing the rcopy and sexp functions. It means the conversion mechanism happens in Julia, which makes extending conversion rules less elegant in third party packages. (They will either need to write Julia code or calling JuliaCall::julia_call multiple times). On the other hand, reticulate first exposes python objects to R via a wrapper, then it uses S3 methods r_to_py and py_to_r to do conversions. So third party packages could easily extend the conversion rules in R. I like this approach better as it is a more R way to handle stuff.
It is also the reason why it is quite difficult to stuff in autodiff. It is because you now only expose a Julia object as a generic JuliaObject, but instead, a more flexible approach is to expose a Julia object as an R object with its supertypes in julia.
For instance, a julia vector should be exposed with classes c("Array", "DenseArray", "AbstractArray", "JuliaObject"). Then the methods jl_to_r.Array and r_to_jl.Array would dispatch the conversions. For the purpose of autodiff, we will need to make Array as S4 to allow multiplication. However, it is unclear to me how to dispatch parametric types if we go that route.
Maybe it is something that I could be experimenting in juliaapi
Yes, I have to admit that any direct extension of JuliaObject is not elegant and maybe problematic.
The way I thought originally is to have indirect extension, for example, users can have their own objects, which could contain the JuliaObject, and some other information they need, and they can define their own methods for their objects. But this mechanism may not be ideal....
And there is also a mechanism called autowrap in JuliaCall. Some discussion can be seen at #49 .
I think the problem in Julia is more complicated than problem with Python, because of multiple dispatching vs. object oriented.
Have some explicit conversion is not difficult, and I think explicit conversion can corporate with current mechanism of JuliaObject. The two ways can actually co-exist. Actually there are already functions like as.double, as.list which can be seen as a special version of explicit conversion.
I have also experimented more classes in JuliaCall to deal with some tricky situations.
And I currently don't get the advantage of the idea of long list of classes to mimic type hierarchy in Julia. The current mechanism is to let Julia not R to do the method dispatching, which I think is more robust, less tricky, and should have performance advantage. Could you explain more?
My proposed mechanism allows third-party packages to extend conversion rules easier without writing explicit Julia code. It is not about performance but extensibility.
Imagine there is a third party Julia type
struct Point
x::Float64
y::Float64
end
and a developer wants to convert it from/to an R class.
In the current approach, the third party developer needs to write JuliaCode to extend rcopy and sexp in order to make JuliaCall understands how to convert Point. With my approach, a Point object will be just exposed to R as an object with classes c("Point", "JuliaObject").
Then they just need to define jl_to_r.Point and r_to_jl.Point to allow conversion. Note that jl_to_r.default and r_to_jl.default are defined by JuliaCall and the default conversions would utilize rcopy and sexp.
WIth the default converters, the developer just needs to call jl_to_r() to convert x and y to their default types. The functions jl_to_r.Point and r_to_jl.Point are purely R based and the third party developer do not need to write Julia code or using JuliaCall::julia_call.
The autowrap functions tells RCall to convert an object with specific julia type to JuliaObject. I think the best way is to always wrap a julia object as JuliaObject + its super types and to use the jl_to_r and r_to_jl approach above.
The hypothetical jl_to_r.Point and r_to_jl.Point still have to call some julia code written by user, isn't it?
autowrap tells RCall to treat an object as JuliaObject, and automatically overloads some fields and functions.
The hypothetical jl_to_r.Point and r_to_jl.Point still have to call some julia code written by user, isn't it?
Suppose RPoint is the corresponding R6 class for Point.
jl_to_r.Point <- function(jlo) {
RPoint(jl_to_r(jlo$x), jl_to_r(jlo$y))
}
r_to_jl.RPoint <- function(ro) {
julia_call("Point", ro$x, ro$y) # the conversion of `ro$x` and `ro$y` could be done implictly in `julia_call`
}
Well, the answer is yes if you count julia_call as julia code though. And one obvious drawback is that RCall knows nothing about this conversion rule, but I argue it is not a huge drawback as users are working on the R side rather than the Julia side.
There is actually an additional natural question arises: should jlo$x returns an JuliaObject or it converts implictly (reticulate converts implicty)
Basically, it replicates the design of reticulate which I think it is more extensible.
I see.
I don't want to break the current mechanism for now.
Adding new class attributes to JuliaObject seems to break the RClass mechanism used by RCall.
But there is actually some trick to get around the class attributes directly but still have the R dispatching, at least for S3. In R's UseMethod, you can pass in the objects you want to dispatch on, normally it will be the first argument, but I've done second argument or third argument before and I believe it can actually be everything you want.
So the hypothetical jl_to_r.Point is still possible without Point in class attribute.
If this could be done, then I can call jl_to_r and r_to_jl in the mechanism used by JuliaCall now, with jl_to_r after the sexp and r_to_jl before rcopy, so everything would be exactly the same unless users customize their own conversion functions. And autowrap can still work, which I think is also convenient for users to define their own conversion rule.
But the parametric type is still a big problem to it.
There is another way to deal with the problem posed by parametric type is that we still use the julia dispatching system but give a more convenient interface for users. For example, users could do things like:
jl_to_r("Point", function(jlo) RPoint(jl_to_r(jlo$x), jl_to_r(jlo$y)) and it will translate into defining a function like following in julia
jl_to_r(jlo :: Point) = "the function that is the second argument"(jlo)
And r_to_jl.Point function can be defined as before, or use a syntax in consistent with the jl_to_r("Point", "some function") one.
And in this way, there is no need to modify the class attribute.
And the order of hypothetical jl_to_r and r_to_jl will the same as the first scheme, i.e., jl_to_r after sexp and r_to_jl before rcopy.
The second scheme is more performant and it has advantages that we don't need to invent the wheel again.
I am planning to support JuliaObject in RCall. The improved RClass mechanism will loop over all classes instead of just the first one, so it will okay to have multiple classes. It is not a big deal.
There is another issue of having conversion dispatches in Julia. Objects like Vector{Point} do not like automatic translation unless the corresponding functions of rcopy and sexp are defined. All in all, I still prefer having the conversions in R to make third party extension a bit easier.
It's glad to hear that RCall is going to support JuliaObject and the expansion of RClass.
I believe that previously if you define sexp for Point and convert Point into RPoint, then Vector{Point} will get converted into a list of RPoint, but maybe it's not the behavior of RCall currently. And this was the reason of the following lines in JuliaCall's JuliaObject.jl:
## Regarding to issue #12, #13 and #16,
## we should use JuliaObject for general AbstractArray
@suppress_err begin
JuliaCall.sexp{T}(x :: AbstractArray{T}) = sexp(JuliaObject(x))
end
## AbstractArray{Any} should be converted to R List
sexp(x :: AbstractArray{Any}) = sexp(VecSxp, x)
This will prevent things like Vector{Point} getting converted into a list of RPoint which will cause the loss of type parameter information in Vector{Point}. I'm not sure what should be the R object corresponding to Vector{Point} given Point corresponding to RPoint.
The third party expansion is important and R conversion mechanism can serve as supplementary to the rcopy and sexp mechanism in RCall. Or the whole things can be migrated into RCall?