r-polars
r-polars copied to clipboard
Building a rust binding R package depends on this r-polars
First of all, thank you very much for this great R Polars project.
I would like to implement a db connection rust binding R package and convert data to polars directly and hence r-polars, but I don't know how to access r-polars robj. e.g. py-polars has https://github.com/pola-rs/pyo3-polars, which allows me to access python polars object. I was struggling to access r-polars series and dataframe from rust code, keep getting "invalid permission" or "memory not mapped" error.
Any suggestion is appreciated.
I defined a same RPolarsDataFrame struct, but apparently, this doesn't work.
if any.inherits("RPolarsDataFrame") {
let df: ExternalPtr<RPolarsDataFrame> = any.try_into().unwrap();
Ok(K::DataFrame(df.0.clone()))
}
Here is https://github.com/rpolars/extendr_polars written by @sorhawell, but I don't know if it currently works as it is not tested and unmaintained.
See also #732 and https://github.com/pola-rs/r-polars/pull/776#discussion_r1479727362
Ideally, I think polars::DataFrame (Rust) should implement the Arrow C Stream interface and use it for each language's binding. (pola-rs/polars#14208)
Thanks for the answers @eitsupi . I will try the repo above
Contributions to https://github.com/rpolars/extendr_polars or this repository are welcome!
When #1078 is merged, input/output through the Arrow C stream interface of Series and DataFrame will be supported, so it should be possible to read Series and DataFrame with nanoarrow_array_stream
as an intermediate format.
See also https://github.com/JosiahParry/arrow-extendr.
In the near term there is also a very easy way to make this library publishable on crates.io
. All you would really needed to do to make this possible is to stop using git
imports in your Cargo.toml
. That way you could just publish the rust part of the code as a crate and therefor eliminate the need for https://github.com/rpolars/extendr_polars. Which is just a very hacky solution to a problem that should just not exist. The main blocker here is not polars but the used the extendr crate. There have been quite a few features added to main but not published as a version.
Oh and allowing the library to also still be built as rust library instead of staticlib
only should also help.
Thanks for your comment. However, I don't think there is a need to publish it to Crates.io. For example, look at something like https://github.com/apache/datafusion-ballista/blob/04766d5fd7cbb15d583f736979c11de06c720928/python/Cargo.toml#L40.
I don't have the bandwidth to update extendr_polars, so I'd appreciate it if someone could work on updating it.
Well if you do not have the time to update extendr_polars
then that is one good reason to publish r-polars
instead. That would make extendr_polars
obsolete.
But there are more reasons to do so! Let's say that somebody wants to write a rust library and export it via ffi to R. If this library has functions that expect or return a polars data frame then the function signatures need to use RPolarsDataFrame
which comes from this crate. The best way to achieve this is to use r-polars
directly. Currently this requires a git import and patching, because it is not published and only staticlibrary builds are enabled. Effectively this makes the rust part of this code R only. That is just not very useful when building on top of something and there really is no reason I can see that this is beneficial or required.
Allowing rust git import of this crate would be an easy first step to enable broader usage. The only change required for this is to update the supported crate-type
s:
[lib]
crate-type = ['staticlib', 'rlib']
This still taints any code on top to not be publishable on crates.io
because of the git import. But it is better than the current state. To actually make this publishable all git imports need to be removed. Using the latest stable release of r-polars (0.16.4)
removing the git import of upstream polars is a drop in replacement of the git import to polars = { version = "0.39.2", … }
. For the main branch this is not working because 0.40.0
has some breaking changes, but they will need to be addressed in the future anyway. And I don't really think that using git versions of polars will make the transition easier in the long run. The only real blocker to this is extendr
which has not had a stable release in a while, but that is just a question of time.
I hope you reconsider your stance on not publishing this crate, or at least enable building other R libraries on top of r-polars
using rust.
Hi @ju6ge, it seems you have a clear view of what needs to be done in this repo to make the Rust crate more usable by other packages. This is not my case since I'm not fluent in Rust. Given that @eitsupi's time is limited, could you maybe open a PR so that we can move forward on this issue?
@etiennebacher sure thing. I would propose the following way of moving forward.
Merge Request crate-type
- Update supported crate types, this one is trivial as described above. This update would be sufficient to allow other rust libraries to export R bindings with polars types in functions, via git import of the code.
Merge Request polars-0.40.0
2. Updating the dependencies to stable polars versions is more involved. I assume it would not be desirable to crate a new release of 0.16.4
with a stable polars version 0.39.2
. I think it would be better to create a MR for the next release that updates the polars version to 0.40.0
. All subsequent polars version updates should then always use stable polars versions. I am not sure if I am the right person for this job because i am not very deep into polars internals or r-polars
. I am trying to build on top, but I am a rust developer so I can give it a shot ;)
Tracking Issue stable extendr
3. Given that the main blocker to publishing this lib as a rust library is that a new version of extendr
needs to be released I would create an issue to keep track of it. As soon as a new version of extendr
is released, the Issue can be closed with a MR updating the dependency.
Once 3 is completed publishing the crate to crates.io
only requires a cargo publish
in src/rust
. Therefor enabling any downsteam libray using r-polars to also be published.
I have tested the procedure using my own private crate registry, so I am very confident that this will work.
One additional point: this repository already depends on commits prior to polars-0.40.0. 0.40.0 had a fatal bug that prevented it from being an R package unless the prior commits were included polars is mainly developed for the Python package, and the Python packages always depend on the Polars crate on GitHub, not Polars on crates.io, so we are convinced that it is very difficult to depend on Polars on crates.io. So I don't think it is worth it to publish to crates.io with such disadvantages.
I don't see the benefit of publishing to crates.io, as it is not at all a problem to rely on Rust crates on GitHub when using Rust source code in an R package. (If submitted to CRAN, downloads from crates.io are not allowed as like downloads from GitHub, and all crates must be vendored)
Thanks for this detailed plan! This looks good to me, feel free to open a PR for 1) as it's the easiest one.
Regarding 2), this might take some time (which is not a big deal since we have to wait for extendr
to address 3) anyway). We updated to rust-polars 0.40.0 a couple of weeks ago but we decided to go forward and to include a more bleeding-edge version of rust-polars since there were many important fixes added right after 0.40.0. Therefore, we'll see when rust-polars 0.40.1 or 0.41.0 is released, which could take a month or more.
Edit: given @eitsupi message above (published at the same time), maybe wait before proceeding with 1)
polars is mainly developed for the Python package, and the Python packages always depend on the Polars crate on GitHub, not Polars on crates.io, so we are convinced that it is very difficult to depend on Polars on crates.io.
Not sure I understand @eitsupi. Most of the time, we use the Github commit of rust-polars which is associated to the Github release (and therefore to the new version on crates.io). There were a few occasions (like the current state of the repo) where we use a commit that is not associated with a github release but that is rare. So in most occasions, what we do is equivalent to using the crates.io versions, no?
So in most occasions, what we do is equivalent to using the crates.io versions, no?
Yes, of course it is. But I think the patch is actually applied when it is released to crates.io. I have tried in the past to reference crates.io instead of the GitHub commit, but for some reason I made it into the build. In short, I don't think polars on crates.io are guaranteed to be available.
@eitsupi I do not understand the sentiment that polars would be unavailible on crates.io
. I have a code that depends on polars for at least two years now and is has not been a problem. If you always want the newest commit then of course they are not availible on crates.io
. But developers version their releases for a reason and stable versions are always available on crates.io
for downstream users of a library it make a lot of sense to use the stable version. If a new version breaks something critical nobody forces you to upgrade until it is fixed.
@ju6ge Most of Polars tests are written on the Python side, and I feel that Rust Polars is not stable at all.
I think this is evident from the fact that Python Polars will soon be 1.0.0, whereas most developers do not pay attention to the version of Rust Polars (Breaking changes unmarked breaking changes are frequently introduced).
Also, extendr and libR-sys are rarely released on crates.io, so I think it would be very hard for developers to stick to the version on crates.io.
Again, I don't understand the need to register r-polars on crates.io even if all dependencies can be downloaded from crates.io. What's wrong with downloading from GitHub?
@eitsupi It is about the taint
that is introduced because of the git
import. Any library that is written using a git
import can not be published. So if I write a library that has an optional feature flag to enable r-bindings
and want to export RPolarsDataFrames
. If I import r-polars
via git I am prohibited from publishing it even if all other code works just with stable versions just fine. So this is a transitive requirement that gets in the way. Crates.io
is the way all rust dependencies are distributed officially.
I agree that it might be annoying, if rust polars abi is unstable it does not matter if you use the git version or the crates.io version. The amount of work to stay in sync will be the same in the long run.
Crates.io
is the way all rust dependencies are distributed officially.
In the Rust world, yes, but not in the CRAN-centered R world. https://cran.r-project.org/web/packages/using_rust.html
In the R world, I don't see the value in releasing to crates.io at this time.
I'm sure I've argued the same thing many times, but I welcome contributions to extendr-polars. Why not just update extendr-polars with a dependency on polars on crates.io like pyo3-polars?
I agree that it might be annoying, if rust polars abi is unstable it does not matter if you use the git version or the crates.io version. The amount of work to stay in sync will be the same in the long run.
It is a really painful task to fix a large number of breaking changes in Rust Polars in one PR, which is released only once a month. So I definitely do not want to commit to that.
I'm sure I've argued the same thing many times, but I welcome contributions to extendr-polars. Why not just update extendr-polars with a dependency on polars on crates.io like pyo3-polars?
So you are arguing it is easier to update two seperate code bases instead of a single codebase?
All that is needed to export to R from rust is already here. extendr_polars
even depends on r-polars
by requiring a working R session with r-polars
installed.
It is a really painful task to fix a large number of breaking changes in Rust Polars in one PR, which is released only once a month. So I definitely do not want to commit to that.
I get that, and I get your time is limited. I do also want to express my respect for the work that has gone into this project already.
So you are arguing it is easier to update two seperate code bases instead of a single codebase?
Good point. Ideally, I would prefer to move the extendr hack in r-polars to extendr-polars and have r-polars depend on extendr-polars. However, it takes a lot of work to do that, so for now we feel it is easier to put a copy in extendr-polars.
In long-term, I expect the codebase to be simpler with savvy than with extendr. (#1126)
Currently there is code on the Rust and R sides to use the Result type on extendr, and these are needed for extendr-polars (or something like that).
I just created #1135. Which addresses enabling downstream projects to import r-polars
as a rust dependency.