rnim
rnim copied to clipboard
A bridge between R and Nim
- rnim - A bridge between R ⇔ Nim
Currently this is a barely working prototype.
Calling R functions from Nim works reasonably well, if basic Nim types are used. Both named and unnamed function arguments are supported.
The R =SEXP= object can be converted into all Nim types, which are supported in the other direction.
Interfacing with shared libraries written in Nim works for basic types. See the =tNimFromR.nim= and =tCallNimFromR.R= files for an example in =tests=.
** Basic syntax to call R from Nim
Intefacing with R from Nim works by making use of the =Rembedded.h= functionality, which effectively launches a silent, embedded R repl.
This repl is then fed with S expressions to be evaluated. The S expression is the basic data type on the C side of R. Essentially everything is mapped to different kinds of S expressions, be it symbols, functions, simple data types, vectors etc.
This library aims to hide both the data conversions and memory handling from the user.
This means that typically one sets up the R repl, does some calls to R and finally shuts down the R repl again: #+begin_src nim let R = setupR()
some or many calls to R functions
teardown(R) #+end_src
The returned =R= object is essentially just a dummy object, which is used to help with overload resolution (we want =untyped= templates to allow calling and R function by ident without having to manually wrap them) and it keeps track of the state of the repl.
In order to not have to call the =teardown= procedure manually, there are two options:
- a =withR= template, which takes a block of code and injects a variable =R= into its calling scope. The repl will be shut down when leaving its scope
- by compiling with =--gc:arc= or =--gc:orc=. In that case we can define a proper destructor, which will be automatically called when the =R= variable runs out of scope and is destroyed.
Note two things:
- in principle there is a finalizer defined for the non ARC / ORC case, which performs the same duty. However, at least according to my understanding, it's run whenever the GC decides to collect the =R= variable. This might not be very convenient.
- I don't know whether it's an inherent limitation of the embedded R repl, but it seems like one cannot destroy an R repl and construct a new one. If one tries, one is greeted by #+begin_src sh R is already initialized #+end_src message.
*** Simple usage example
The above out of the way, let's look at the basic things currently possible.
For clarity I will annotate the types even where not required.
#+begin_src nim import rnim let R = setupR()
perform a call to the R stdlib function sum, by using
the .() dot call template and handing a normal Nim seq
let res: SEXP = R.sum(@[1, 2, 3])
the result is a SEXP, the basic R data type. We can now
use the to proc to get a Nim type from it:
doAssert res.to(int) == 6 #+end_src
Some functions, which have atypical names may not be possible to call
via the dot call template. In that case, we can call the underlying
macro directly, called =callEval= (possibly name change incoming...):
#+begin_src nim
doAssert callEval(+, 4.5, 10.5).to(float) == 15.0
#+end_src
This also showcases that functions taking multiple arguments work as
expected. At the moment we're limited to 6 arguments (there's specific
C functions to construct calls up to 6 arguments. Need to
implement arbitrary numbers manually).
Also named arguments are supported. Let's use the =seq= function as an example, the more general version of the =:= operator in R (e.g. =1:5=): #+begin_src nim check R.seq(1, 10, by = 2).to(seq[int]) == toSeq(countup(1, 10, 2)) #+end_src As we can see, we can also convert =SEXPs= containing vectors back to Nim sequences.
Finally, we can also source from arbitrary R files. Assuming we have some R file =foo.R=: #+begin_src R hello <- function(name) { return(paste(c("Hello", name), sep = " ", collapse = " ")) } #+end_src From Nim we can then call it via: #+begin_src nim import rnim
first set up an R interpreter
let R = setupR()
now source the file
R.source("foo.R")
and now we can call R functions defined in the sourced file
doAssert R.hello("User").to(string) == "Hello User" #+end_src
That covers the most basic functionality in place so far.
*** Vectors (data arrays)
Arrays are always a special case, as they are usually the main source of computational work. Avoiding unnecessary copies of arrays is important to keep performance high.
To provide a no-copy interface to data arrays (R vectors) from R, there are two types to help: =NumericVector[T]= and =RawVector[T]=. They provide a nice Nim interface to work with such numerical data.
Any R =SEXP= can be converted to either of these two types. If the corresponding =SEXP= does not correspond to a vector, an exception will be thrown at runtime.
These types internally simply keep a copy of the underlying data array in the =SEXP=.
From a usability standpoint =NumericVector[T]= is the main type that should be used. =RawVector[T]= simply provides a slightly lower wrapper, which is however more restrictive.
A =RawVector[T]= can only be constructed for: =cint, int32, float, cdouble=. This is because the underlying R =SEXP= come only in two types: =INTSXP= and =REALSXP=, the former stores 32-bit integers and the latter 64-bit floats (technically afaik the platform specific size, so 32-bit floats on a 32-bit machine. The inverse is not the case for =INTSXP= though!). There is no way to treat a =REALSXP= vector as a =RawVector[int32]= for instance.
This is where =NumericVector[T]= comes in. It can be constructed for all numerical types larger or equal to 32-bit in size (to avoid loss of information when constructing from a =SEXP=). Unsigned integers so far are also not supported.
A short example: #+begin_src nim :tangle /tmp/readme_numericvector.nim import rnim let R = setupR()
let x = @[1, 2, 3] let xR: SEXP = x.nimToR # types for clarity var nv = initNumericVectorint
nv is now a vector pointing to the same data as xR
we can access individual elements:
echo nv[1] # 2
modify elements:
nv[2] = 5
check its length
doAssert nv.len == 3
iterate over it
for i in 0 .. nv.high: echo nv[i] for x in nv: echo x for i, x in nv: echo "Index ", i, " contains ", x
compare them:
doAssert nv == nv
and print them:
echo nv # NumericVector[int](len: 3, kind: vkFloat, data: [1, 2, 5])
as xR contains the same memory location, constructing another vector
and comparing them yields true, even though we modified nv
let nv2 = initNumericVectorint doAssert nv == nv2
finally we can also construct a NumericVector straight from a Nim sequence
let nv3 = @[1.5, 2.5, 3.5].toNumericVector() echo nv3 #+end_src
If you ran this code you will see a message:
#+begin_src
Interpreting input vector of type REALSXP as int loses information!
#+end_src
This is because we first constructed a =SEXP= from a 64-bit integer sequence in Nim. As mentioned before, 64-bit integers do not exist. Therefore, the =xR SEXP= above is actually stored in a =REALSXP=. By constructing a =NumericVector[int]= we tell the Nim compiler we wish to convert from and to =int=, no matter the underlying type of the =SEXP= array, i.e. =INTSXP= or =REALSXP=. The message simply makes you aware that this is happening (it may be taken out in the future).
The fact that this conversion happens internally is the reason for the existence of =RawVector=, which explicitly disallows this.
Further, =NumericVector= is actually a variant object. Depending on the runtime type of the =SEXP= from which we construct a =SEXP= the correct branch of the variant object will be filled. For extremely performance sensitive application it may thus be preferable to have a type where variant kind checks and possible type conversions do not happen.
*** =Rctx= macro
As mentioned in the previous secton, some function names are weird and require the user to use =callEval= directly.
To make calling such functions a bit nicer, there is an =Rctx= macro, which allows for directly calling R functions with e.g. dots in their names, and also allows for assignments.
#+begin_src nim
let x = @[5, 10, 15] let y = @[2.0, 4.0, 6.0]
var df: SEXP Rctx: df = data.frame(Col1 = x, Col2 = y) let df2 = data.frame(Col1 = x, Col2 = y) print("Hello from R") #+end_src where both =df= as well as =df2= will then store an equivalent data frame. The last line shows that it's also possible to use this macro to avoid the need to discard all R calls.
** Calling Nim code from R
Nim can be used to write extensions for R. This is done by compiling a Nim file as a shared library and calling it in R using the =.Call= interface.
An example can be seen from the tests:
- https://github.com/SciNim/rnim/blob/master/tests/tNimFromR.nim the Nim file that is compiled to a shared library
- https://github.com/SciNim/rnim/blob/master/tests/tCallNimFromR.R the corresponding R file that wraps the shared library
In the near future the latter R file will be auto generated by the Nim code at compile time.
The basic idea is as follows. Assume you want to write an extension that adds two numbers in Nim to be called from R.
You write a Nim file with the desired procedure and attach the ={.exportR.}= pragma as follows:
=myRmodule.nim=: #+begin_src nim import rnim
proc addNumbers*(x, y: SEXP): SEXP {.exportR.} =
adds two numbers. We will treat them as floats
let xNim = x.to(float) let yNim = y.to(float) result = (x + y).nimToR #+end_src
Note the usage of =SEXP= as the input and output types. In the future the conversions (and possibly non copy access) will be automated. For now we have to convert manually to and from Nim types.
This file is compiled as follows: #+begin_src sh nim c (-d:danger) --app:lib (--gc:arc) myRModule.nim #+end_src where the =danger= and =ARC= usage are of course optional (but ARC/ORC is recommended).
This will generate a =libmyRmodule.so=. The resulting shared library in principle needs to be manually loaded via =dyn.load= in R and each procedure in it needs to be called using the =.Call= interface.
Fortunately, this can be automated easily. Therefore, when compiling such a shared library, we automatically emit an R wrapper, that has the same name as the input Nim file. So the following file is generated:
=myRmodule.R=: #+begin_src R dyn.load("libmyRmodule.so")
addNumbers <- function(a, b) { return(.Call("addNumbers", a, b)) } #+end_src
This file can now be sourced from the R interpreter (using the =source= function) or in an R script and then =addNumbers= is usable and will execute the compiled Nim code!
Note that the autogeneration logic assumes the shared library and the generated R script will live in the same directory. If you wish to move one, you might have to adjust the paths that perform the ~dyn.load~ command!
** Trying it out
To try out the functionality of calling R from Nim, you need to meet a few prerequisites.
*** Setup on Linux
- a working R installation with a =libR.so= shared library
- the shell environment variable =R_HOME= needs to be defined and has to point to the directory which contains the full R directory structure. That is /not/ the path where the R binary lies! Finally, the =libR.so= has to be findable for dynamic loading. On my machine the path of it by default isn't added to =ld= via =/etc/ld.so.conf.d= (for the time being I just define =LD_LIBRARY_PATH= Setup on my machine: #+begin_src sh which R echo $R_HOME echo $LD_LIBRARY_PATH #+end_src #+begin_src sh /usr/bin/R /usr/lib/R /usr/lib/R/lib #+end_src
An easy way to set the =R_HOME= variable is by asking R about it:
#+begin_src sh
R RHOME
#+end_src
returns the correct path. We can use that to set the =R_HOME=
variable:
#+begin_src sh
export R_HOME=R RHOME
export LD_LIBRARY_PATH=$R_HOME/lib # maybe not required on your system
#+end_src
*** Setup on Windows
- a working R installation with a =R.dll= shared library
- the shell environment variable =R_HOME= needs to be defined and has to point to the directory which contains the full R directory structure. That is /not/ the path where the R binary lies! Example setup: #+begin_src sh where R.dll set R_HOME #+end_src #+begin_src sh C:\Program Files\R\R-4.0.4\bin\x64\R.dll R_HOME=C:\Program Files\R\R-4.0.4 #+end_src
*** Test your setup
Run the test file: #+begin_src sh nim c -r tests/tRfromNim.nim #+end_src