Feature Request: Make RCall depend dynamically on `R_HOME`
I am using RCall with CondaPkg. I know there is currently a pullrequest which uses preferences to set R_HOME. I realized that this would still not be enough for certain setups. I want to sketch why it would be good if RCall's build does not depend on R_HOME.
Ideal World
When installing something which depends on RCall with R provided by CondaPkg, then what you would like to do is
- instantiate project
- import CondaPkg first and instantiate all CondaPkg dependencies (via
CondaPkg.resolve()) - set R_HOME (via
ENV["R_HOME"] = joinpath(CondaPkg.envdir(), "lib", "R")) - import RCall and use RCall
Current World
The above unfortunately does not work but is immensely more complicated. The reason is that building RCall depends on R_HOME and a valid R_HOME. Hence step 1. the instantiation will fail, as R cannot find a valid R_HOME.
So instead currently you need to do
- start Julia in a dummy project environment
- find the CondaPkg version in the project (manually parsing the Manifest.toml is the simplest way I could find)
- Pkg.add(name="CondaPkg", version="the version we found")
- switch project environments so that CondaPkg thinks it is in the final project environment
- instantiate all CondaPkg dependencies (via
CondaPkg.resolve()) - output
joinpath(CondaPkg.envdir(), "lib", "R")and finish julia - set R_HOME to the to the output R_HOME
- start julia in the actual project
- instantiate project
- import RCall and use RCall
[EDIT added] Probably, this approach has still difficulties instantiating CondaPkg conda packages correctly, because it should also grab CondaPkg.tomls from depend projects in the julia load path. But as they are not yet instantiated, it probably does not pick them up. So I guess instead of this step 2, what you actually would need to do is to write your own instantiate method which instantiates everything but RCall and those depending on RCall. And then still CondaPkg would probably miss the conda dependencies of those who depend on RCall - they would need to be manually integrated without triggering a build of RCall.
Conclusion
It would be so much cleaner and simpler if RCall's build would not depend on R_HOME.
RCall's __init__ method can of course still happily depend on R_HOME.
Probably you will need to make the changes for this yourself. It will require some restructuring of the package to not use stuff from libR during the precompile stage, so I think it might require figuring out some package internals which may take some time. You may have better luck getting a PR merged.
That said, I didn't really understand under what circumstances the preferences approach does not work. Did you try installing my PR? I have been using it together with CondaPkg just fine in my own projects. What steps did you attempt?
Hence step 1. the instantiation will fail, as R cannot find a valid R_HOME
My PR solves this problem by aborting precompilation if there is no valid R_HOME. This will not cause a failure, merely a warning. Once you set things up with preferences, the preferences system will automatically trigger a recompilation of RCall.jl.
Currently, you have to set the preferences up manually or using a helper script (I have posted the latter in the PR), but once the PR is merged we can either a package extension or create a mini-helper package which will set up the preferences automatically giving a higher level of convenience.
You might find it informative to read through my PR and my comments since it contains a lot of relevant information to what you are writing. With regards to
RCall's
__init__method can of course still happily depend on R_HOME.
Please see:
when other modules which import RCall are precompiled they will run the
__init__. I tried an approach that checked forcurrently_compiling() = ccall(:jl_generating_output, Cint, ()) != 0in__init__and skipped the init in that case, however this will mean an R interpreter is not started, and apparently this is needed even during precompliation time for the R_str macro. The resulting segfault was a bit surprising to me -- but I suppose the R runtime is needed even at this stage for such a close integration.
Instructions on how to test my branch together with ConadPkg are here: https://github.com/JuliaPy/CondaPkg.jl/issues/100#issuecomment-1676824947
Please let me know if you have any problems or have identified any issues, and I will be happy to discuss and attempt to address. In case you have looked more closely at the different approaches and happen to decide my PR is a reasonable approach, getting behind it could help it get merged.
Thank you frankier for all your comments and help.
What you write sounds pretty good. (Only that other packages call RCall's __init__ method during precompilation sounds quite horrifying actually... is this really so?)
My biggest wish is that
Pkg.instantiate()should work, i.e. compiling RCall without R_HOME being setup correctly.- Assume that the LocalPreferences.toml exists in the same folder and sets R_HOME to the place it later will find an R installation
- but as of instantiation time, no such R installation is available yet (will be installed via CondaPkg).
- afterwards I would like to call
import CondaPkg; CondaPkg.resolve(); import RCall- and it should somehow trigger the correct R build
- such that it won't rebuild everytime I run this second step again, but rather reuse the compiled version.
It seems to me that the preferences approach has the difficulty of managing the interaction between step 1. and step 2. Given the same LocalPreferences, the first time the build should silently fail, while the second time the build should be retriggered (but only retriggered the very first time step 2. is run, subsequent runs shouldn't need a rebuild).
I guess you could solve this by having a dummy variable in LocalPreferences.toml which indicates whether this is the first instantiation or a subsequent normal build...
Not sure whether this would work.
Only that other packages call RCall's init method during precompilation sounds quite horrifying actually... is this really so?
This surprised me at first, but this is in fact always the case. It's not spelled out in https://docs.julialang.org/en/v1/manual/modules/#Module-initialization-and-precompilation - which only mentions that __init__ is called during using, which it seems does in fact include when precompiling depending packages. I asked about this on Slack but didn't get a response, however a minimal test reveals this is always the case. As mentioned, you could try and abort __init__ during precompiling another package, but then we will have problems with R_str.
My biggest wish is that
Pkg.instantiate()should work, i.e. compiling RCall without R_HOME being setup correctly.
- Assume that the LocalPreferences.toml exists in the same folder and sets R_HOME to the place it later will find an R installation
- but as of instantiation time, no such R installation is available yet (will be installed via CondaPkg).
afterwards I would like to call
import CondaPkg; CondaPkg.resolve(); import RCall
- and it should somehow trigger the correct R build
- such that it won't rebuild everytime I run this second step again, but rather reuse the compiled version.
It seems to me that the preferences approach has the difficulty of managing the interaction between step 1. and step 2. Given the same LocalPreferences, the first time the build should silently fail, while the second time the build should be retriggered (but only retriggered the very first time step 2. is run, subsequent runs shouldn't need a rebuild).
I definitely see what you're getting at -- that it would be convenient to set-up LocalPreferences.toml beforehand and retrigger things automatically whenever R is updated. This isn't how it works at the moment -- and to me it seems logical that by specifying the libR preference, you are also saying that it actually exists and is usable. The solution is therefore to run CondaPkg first and then set up the LocalPreferences.toml after R has been installed.
In the future, it should be possible to create some kind of post-install CondaPkg hook to automate this, so that the preference is only set up when R actually becomes available. However, using the preference system to configure libR is generally useful beyond CondaPkg. For example, with the current approach, changing R_HOME in one project, might affect another project on your machine if they happen to be using the same version of RCall. Using preferences fixes this problem, and this is why I am trying to get the preference PR merged as a first step.
I agree that Preferences improve the situation. It is just that dynamic resolution is still more handy than Preferences.
Sure. One concrete advantage I can see is that we wouldn't end up with one copies of RCall.jl bytecode for each R installation.
I think @frankier's work on Preference's largely covers as much as we can conveniently handle at the moment.