r-polars icon indicating copy to clipboard operation
r-polars copied to clipboard

Polars Expression plugins for R

Open eitsupi opened this issue 10 months ago • 5 comments

We needs:

  1. Mechanism for registering subnamespaces from outside the package something like https://docs.pola.rs/py-polars/html/reference/api.html
  2. Rust crate something like https://github.com/pola-rs/pyo3-polars

eitsupi avatar Apr 11 '24 16:04 eitsupi

Note: Serialization and deserialization of R objects that may be needed are already defined here (I don't know if this is sufficient)

https://github.com/pola-rs/r-polars/blob/1ea820b7791a1747fe477cb4d7d7172dcf619137/src/rust/src/rbackground.rs#L77-L131

eitsupi avatar Apr 13 '24 08:04 eitsupi

  1. Mechanism for registering subnamespaces from outside the package something like docs.pola.rs/py-polars/html/reference/api.html

I was able to make this work in an implementation that I am rewriting from scratch using py-polars as a reference. https://github.com/eitsupi/neo-r-polars/blob/afac2ae8020e4dbe3d02f7515653a574283b577a/man/polars_api_register_series_namespace.Rd#L20-L44

# s: polars series
math_shortcuts <- function(s) {
  # Create a new environment to store the methods
  self <- new.env(parent = emptyenv())

  # Store the series
  self$`_s` <- s

  # Add methods
  self$square <- function() self$`_s` * self$`_s`
  self$cube <- function() self$`_s` * self$`_s` * self$`_s`

  # Set the class
  class(self) <- "polars_namespace_series"

  # Return the environment
  self
}

polars_api_register_series_namespace("math", math_shortcuts)

s <- as_polars_series(c(1.5, 31, 42, 64.5))
s$math$square()$rename("s^2")

s <- as_polars_series(1:5)
s$math$cube()$rename("s^3")

The current concern is performance degradation due to frequent for loops (basically each call to a single method). I believe the current implementation of r-polars registers all active bindings and methods when the package is installed, but it registers methods each time an R class instance is built, which would degrade performance (Of course, if it's acceptable, no problem) https://github.com/eitsupi/neo-r-polars/blob/afac2ae8020e4dbe3d02f7515653a574283b577a/R/series-series.R#L7-L31

eitsupi avatar Jun 16 '24 05:06 eitsupi

I have looked into this and it appears that this is accomplished by connecting to a dynamic library via the libloading crate. https://docs.rs/libloading/latest/libloading/ https://github.com/pola-rs/polars/blob/5cad69e5d4af47e75ae0abbf88dc2bafbc8f66d2/crates/polars-plan/src/dsl/function_expr/plugin.rs#L5

In the case of R packages, it is the static libraries, not the dynamic libraries, that are built by rustc. Dynamic libraries are built by R.

We need to find a way to generate the proper expected C ABI on the plugin side, but this is obviously beyond my knowledge.

eitsupi avatar Jun 23 '24 12:06 eitsupi

In the case of R packages, it is the static libraries, not the dynamic libraries, that are built by rustc. Dynamic libraries are built by R.

We need to find a way to generate the proper expected C ABI on the plugin side, but this is obviously beyond my knowledge.

The recent libr might be of use here: https://github.com/posit-dev/ark/tree/main/crates#readme

etiennebacher avatar Jun 29 '24 10:06 etiennebacher

My understanding is that dynamic libraries are built by R, so it doesn't matter which Rust crate is chosen to build the static library. The question here is that I don't know how to make a proper C ABI for the dynamic library created by R.

eitsupi avatar Jun 29 '24 12:06 eitsupi