rnim icon indicating copy to clipboard operation
rnim copied to clipboard

A bridge between R and Nim

  • rnim - A bridge between R ⇔ Nim

Currently this is a barely working prototype.

Calling R functions from Nim works reasonably well, if basic Nim types are used. Both named and unnamed function arguments are supported.

The R =SEXP= object can be converted into all Nim types, which are supported in the other direction.

Interfacing with shared libraries written in Nim works for basic types. See the =tNimFromR.nim= and =tCallNimFromR.R= files for an example in =tests=.

** Basic syntax to call R from Nim

Intefacing with R from Nim works by making use of the =Rembedded.h= functionality, which effectively launches a silent, embedded R repl.

This repl is then fed with S expressions to be evaluated. The S expression is the basic data type on the C side of R. Essentially everything is mapped to different kinds of S expressions, be it symbols, functions, simple data types, vectors etc.

This library aims to hide both the data conversions and memory handling from the user.

This means that typically one sets up the R repl, does some calls to R and finally shuts down the R repl again: #+begin_src nim let R = setupR()

some or many calls to R functions

teardown(R) #+end_src

The returned =R= object is essentially just a dummy object, which is used to help with overload resolution (we want =untyped= templates to allow calling and R function by ident without having to manually wrap them) and it keeps track of the state of the repl.

In order to not have to call the =teardown= procedure manually, there are two options:

  • a =withR= template, which takes a block of code and injects a variable =R= into its calling scope. The repl will be shut down when leaving its scope
  • by compiling with =--gc:arc= or =--gc:orc=. In that case we can define a proper destructor, which will be automatically called when the =R= variable runs out of scope and is destroyed.

Note two things:

  1. in principle there is a finalizer defined for the non ARC / ORC case, which performs the same duty. However, at least according to my understanding, it's run whenever the GC decides to collect the =R= variable. This might not be very convenient.
  2. I don't know whether it's an inherent limitation of the embedded R repl, but it seems like one cannot destroy an R repl and construct a new one. If one tries, one is greeted by #+begin_src sh R is already initialized #+end_src message.

*** Simple usage example

The above out of the way, let's look at the basic things currently possible.

For clarity I will annotate the types even where not required.

#+begin_src nim import rnim let R = setupR()

perform a call to the R stdlib function sum, by using

the .() dot call template and handing a normal Nim seq

let res: SEXP = R.sum(@[1, 2, 3])

the result is a SEXP, the basic R data type. We can now

use the to proc to get a Nim type from it:

doAssert res.to(int) == 6 #+end_src

Some functions, which have atypical names may not be possible to call via the dot call template. In that case, we can call the underlying macro directly, called =callEval= (possibly name change incoming...): #+begin_src nim doAssert callEval(+, 4.5, 10.5).to(float) == 15.0 #+end_src This also showcases that functions taking multiple arguments work as expected. At the moment we're limited to 6 arguments (there's specific C functions to construct calls up to 6 arguments. Need to implement arbitrary numbers manually).

Also named arguments are supported. Let's use the =seq= function as an example, the more general version of the =:= operator in R (e.g. =1:5=): #+begin_src nim check R.seq(1, 10, by = 2).to(seq[int]) == toSeq(countup(1, 10, 2)) #+end_src As we can see, we can also convert =SEXPs= containing vectors back to Nim sequences.

Finally, we can also source from arbitrary R files. Assuming we have some R file =foo.R=: #+begin_src R hello <- function(name) { return(paste(c("Hello", name), sep = " ", collapse = " ")) } #+end_src From Nim we can then call it via: #+begin_src nim import rnim

first set up an R interpreter

let R = setupR()

now source the file

R.source("foo.R")

and now we can call R functions defined in the sourced file

doAssert R.hello("User").to(string) == "Hello User" #+end_src

That covers the most basic functionality in place so far.

*** Vectors (data arrays)

Arrays are always a special case, as they are usually the main source of computational work. Avoiding unnecessary copies of arrays is important to keep performance high.

To provide a no-copy interface to data arrays (R vectors) from R, there are two types to help: =NumericVector[T]= and =RawVector[T]=. They provide a nice Nim interface to work with such numerical data.

Any R =SEXP= can be converted to either of these two types. If the corresponding =SEXP= does not correspond to a vector, an exception will be thrown at runtime.

These types internally simply keep a copy of the underlying data array in the =SEXP=.

From a usability standpoint =NumericVector[T]= is the main type that should be used. =RawVector[T]= simply provides a slightly lower wrapper, which is however more restrictive.

A =RawVector[T]= can only be constructed for: =cint, int32, float, cdouble=. This is because the underlying R =SEXP= come only in two types: =INTSXP= and =REALSXP=, the former stores 32-bit integers and the latter 64-bit floats (technically afaik the platform specific size, so 32-bit floats on a 32-bit machine. The inverse is not the case for =INTSXP= though!). There is no way to treat a =REALSXP= vector as a =RawVector[int32]= for instance.

This is where =NumericVector[T]= comes in. It can be constructed for all numerical types larger or equal to 32-bit in size (to avoid loss of information when constructing from a =SEXP=). Unsigned integers so far are also not supported.

A short example: #+begin_src nim :tangle /tmp/readme_numericvector.nim import rnim let R = setupR()

let x = @[1, 2, 3] let xR: SEXP = x.nimToR # types for clarity var nv = initNumericVectorint

nv is now a vector pointing to the same data as xR

we can access individual elements:

echo nv[1] # 2

modify elements:

nv[2] = 5

check its length

doAssert nv.len == 3

iterate over it

for i in 0 .. nv.high: echo nv[i] for x in nv: echo x for i, x in nv: echo "Index ", i, " contains ", x

compare them:

doAssert nv == nv

and print them:

echo nv # NumericVector[int](len: 3, kind: vkFloat, data: [1, 2, 5])

as xR contains the same memory location, constructing another vector

and comparing them yields true, even though we modified nv

let nv2 = initNumericVectorint doAssert nv == nv2

finally we can also construct a NumericVector straight from a Nim sequence

let nv3 = @[1.5, 2.5, 3.5].toNumericVector() echo nv3 #+end_src

If you ran this code you will see a message: #+begin_src Interpreting input vector of type REALSXP as int loses information! #+end_src

This is because we first constructed a =SEXP= from a 64-bit integer sequence in Nim. As mentioned before, 64-bit integers do not exist. Therefore, the =xR SEXP= above is actually stored in a =REALSXP=. By constructing a =NumericVector[int]= we tell the Nim compiler we wish to convert from and to =int=, no matter the underlying type of the =SEXP= array, i.e. =INTSXP= or =REALSXP=. The message simply makes you aware that this is happening (it may be taken out in the future).

The fact that this conversion happens internally is the reason for the existence of =RawVector=, which explicitly disallows this.

Further, =NumericVector= is actually a variant object. Depending on the runtime type of the =SEXP= from which we construct a =SEXP= the correct branch of the variant object will be filled. For extremely performance sensitive application it may thus be preferable to have a type where variant kind checks and possible type conversions do not happen.

*** =Rctx= macro

As mentioned in the previous secton, some function names are weird and require the user to use =callEval= directly.

To make calling such functions a bit nicer, there is an =Rctx= macro, which allows for directly calling R functions with e.g. dots in their names, and also allows for assignments.

#+begin_src nim

let x = @[5, 10, 15] let y = @[2.0, 4.0, 6.0]

var df: SEXP Rctx: df = data.frame(Col1 = x, Col2 = y) let df2 = data.frame(Col1 = x, Col2 = y) print("Hello from R") #+end_src where both =df= as well as =df2= will then store an equivalent data frame. The last line shows that it's also possible to use this macro to avoid the need to discard all R calls.

** Calling Nim code from R

Nim can be used to write extensions for R. This is done by compiling a Nim file as a shared library and calling it in R using the =.Call= interface.

An example can be seen from the tests:

  • https://github.com/SciNim/rnim/blob/master/tests/tNimFromR.nim the Nim file that is compiled to a shared library
  • https://github.com/SciNim/rnim/blob/master/tests/tCallNimFromR.R the corresponding R file that wraps the shared library

In the near future the latter R file will be auto generated by the Nim code at compile time.

The basic idea is as follows. Assume you want to write an extension that adds two numbers in Nim to be called from R.

You write a Nim file with the desired procedure and attach the ={.exportR.}= pragma as follows:

=myRmodule.nim=: #+begin_src nim import rnim

proc addNumbers*(x, y: SEXP): SEXP {.exportR.} =

adds two numbers. We will treat them as floats

let xNim = x.to(float) let yNim = y.to(float) result = (x + y).nimToR #+end_src

Note the usage of =SEXP= as the input and output types. In the future the conversions (and possibly non copy access) will be automated. For now we have to convert manually to and from Nim types.

This file is compiled as follows: #+begin_src sh nim c (-d:danger) --app:lib (--gc:arc) myRModule.nim #+end_src where the =danger= and =ARC= usage are of course optional (but ARC/ORC is recommended).

This will generate a =libmyRmodule.so=. The resulting shared library in principle needs to be manually loaded via =dyn.load= in R and each procedure in it needs to be called using the =.Call= interface.

Fortunately, this can be automated easily. Therefore, when compiling such a shared library, we automatically emit an R wrapper, that has the same name as the input Nim file. So the following file is generated:

=myRmodule.R=: #+begin_src R dyn.load("libmyRmodule.so")

addNumbers <- function(a, b) { return(.Call("addNumbers", a, b)) } #+end_src

This file can now be sourced from the R interpreter (using the =source= function) or in an R script and then =addNumbers= is usable and will execute the compiled Nim code!

Note that the autogeneration logic assumes the shared library and the generated R script will live in the same directory. If you wish to move one, you might have to adjust the paths that perform the ~dyn.load~ command!

** Trying it out

To try out the functionality of calling R from Nim, you need to meet a few prerequisites.

*** Setup on Linux

  • a working R installation with a =libR.so= shared library
  • the shell environment variable =R_HOME= needs to be defined and has to point to the directory which contains the full R directory structure. That is /not/ the path where the R binary lies! Finally, the =libR.so= has to be findable for dynamic loading. On my machine the path of it by default isn't added to =ld= via =/etc/ld.so.conf.d= (for the time being I just define =LD_LIBRARY_PATH= Setup on my machine: #+begin_src sh which R echo $R_HOME echo $LD_LIBRARY_PATH #+end_src #+begin_src sh /usr/bin/R /usr/lib/R /usr/lib/R/lib #+end_src

An easy way to set the =R_HOME= variable is by asking R about it: #+begin_src sh R RHOME #+end_src returns the correct path. We can use that to set the =R_HOME= variable: #+begin_src sh export R_HOME=R RHOME export LD_LIBRARY_PATH=$R_HOME/lib # maybe not required on your system #+end_src

*** Setup on Windows

  • a working R installation with a =R.dll= shared library
  • the shell environment variable =R_HOME= needs to be defined and has to point to the directory which contains the full R directory structure. That is /not/ the path where the R binary lies! Example setup: #+begin_src sh where R.dll set R_HOME #+end_src #+begin_src sh C:\Program Files\R\R-4.0.4\bin\x64\R.dll R_HOME=C:\Program Files\R\R-4.0.4 #+end_src

*** Test your setup

Run the test file: #+begin_src sh nim c -r tests/tRfromNim.nim #+end_src