r-polars icon indicating copy to clipboard operation
r-polars copied to clipboard

Building a rust binding R package depends on this r-polars

Open jshinonome opened this issue 1 year ago • 24 comments

First of all, thank you very much for this great R Polars project.

I would like to implement a db connection rust binding R package and convert data to polars directly and hence r-polars, but I don't know how to access r-polars robj. e.g. py-polars has https://github.com/pola-rs/pyo3-polars, which allows me to access python polars object. I was struggling to access r-polars series and dataframe from rust code, keep getting "invalid permission" or "memory not mapped" error.

Any suggestion is appreciated.

I defined a same RPolarsDataFrame struct, but apparently, this doesn't work.

if any.inherits("RPolarsDataFrame") {
        let df: ExternalPtr<RPolarsDataFrame> = any.try_into().unwrap();
        Ok(K::DataFrame(df.0.clone()))
    }

jshinonome avatar Feb 12 '24 14:02 jshinonome

Here is https://github.com/rpolars/extendr_polars written by @sorhawell, but I don't know if it currently works as it is not tested and unmaintained.

See also #732 and https://github.com/pola-rs/r-polars/pull/776#discussion_r1479727362

Ideally, I think polars::DataFrame (Rust) should implement the Arrow C Stream interface and use it for each language's binding. (pola-rs/polars#14208)

eitsupi avatar Feb 13 '24 14:02 eitsupi

Thanks for the answers @eitsupi . I will try the repo above

jshinonome avatar Feb 13 '24 14:02 jshinonome

Contributions to https://github.com/rpolars/extendr_polars or this repository are welcome!

eitsupi avatar Feb 14 '24 14:02 eitsupi

When #1078 is merged, input/output through the Arrow C stream interface of Series and DataFrame will be supported, so it should be possible to read Series and DataFrame with nanoarrow_array_stream as an intermediate format. See also https://github.com/JosiahParry/arrow-extendr.

eitsupi avatar May 06 '24 09:05 eitsupi

In the near term there is also a very easy way to make this library publishable on crates.io. All you would really needed to do to make this possible is to stop using git imports in your Cargo.toml. That way you could just publish the rust part of the code as a crate and therefor eliminate the need for https://github.com/rpolars/extendr_polars. Which is just a very hacky solution to a problem that should just not exist. The main blocker here is not polars but the used the extendr crate. There have been quite a few features added to main but not published as a version.

ju6ge avatar May 31 '24 09:05 ju6ge

Oh and allowing the library to also still be built as rust library instead of staticlib only should also help.

ju6ge avatar May 31 '24 09:05 ju6ge

Thanks for your comment. However, I don't think there is a need to publish it to Crates.io. For example, look at something like https://github.com/apache/datafusion-ballista/blob/04766d5fd7cbb15d583f736979c11de06c720928/python/Cargo.toml#L40.

I don't have the bandwidth to update extendr_polars, so I'd appreciate it if someone could work on updating it.

eitsupi avatar Jun 02 '24 14:06 eitsupi

Well if you do not have the time to update extendr_polars then that is one good reason to publish r-polars instead. That would make extendr_polars obsolete.

But there are more reasons to do so! Let's say that somebody wants to write a rust library and export it via ffi to R. If this library has functions that expect or return a polars data frame then the function signatures need to use RPolarsDataFrame which comes from this crate. The best way to achieve this is to use r-polars directly. Currently this requires a git import and patching, because it is not published and only staticlibrary builds are enabled. Effectively this makes the rust part of this code R only. That is just not very useful when building on top of something and there really is no reason I can see that this is beneficial or required.

Allowing rust git import of this crate would be an easy first step to enable broader usage. The only change required for this is to update the supported crate-types:

[lib]
crate-type = ['staticlib', 'rlib']

This still taints any code on top to not be publishable on crates.io because of the git import. But it is better than the current state. To actually make this publishable all git imports need to be removed. Using the latest stable release of r-polars (0.16.4) removing the git import of upstream polars is a drop in replacement of the git import to polars = { version = "0.39.2", … }. For the main branch this is not working because 0.40.0 has some breaking changes, but they will need to be addressed in the future anyway. And I don't really think that using git versions of polars will make the transition easier in the long run. The only real blocker to this is extendr which has not had a stable release in a while, but that is just a question of time.

I hope you reconsider your stance on not publishing this crate, or at least enable building other R libraries on top of r-polars using rust.

ju6ge avatar Jun 03 '24 07:06 ju6ge

Hi @ju6ge, it seems you have a clear view of what needs to be done in this repo to make the Rust crate more usable by other packages. This is not my case since I'm not fluent in Rust. Given that @eitsupi's time is limited, could you maybe open a PR so that we can move forward on this issue?

etiennebacher avatar Jun 03 '24 11:06 etiennebacher

@etiennebacher sure thing. I would propose the following way of moving forward.

Merge Request crate-type

  1. Update supported crate types, this one is trivial as described above. This update would be sufficient to allow other rust libraries to export R bindings with polars types in functions, via git import of the code.

Merge Request polars-0.40.0 2. Updating the dependencies to stable polars versions is more involved. I assume it would not be desirable to crate a new release of 0.16.4 with a stable polars version 0.39.2. I think it would be better to create a MR for the next release that updates the polars version to 0.40.0. All subsequent polars version updates should then always use stable polars versions. I am not sure if I am the right person for this job because i am not very deep into polars internals or r-polars. I am trying to build on top, but I am a rust developer so I can give it a shot ;)

Tracking Issue stable extendr 3. Given that the main blocker to publishing this lib as a rust library is that a new version of extendr needs to be released I would create an issue to keep track of it. As soon as a new version of extendr is released, the Issue can be closed with a MR updating the dependency.

Once 3 is completed publishing the crate to crates.io only requires a cargo publish in src/rust. Therefor enabling any downsteam libray using r-polars to also be published.

I have tested the procedure using my own private crate registry, so I am very confident that this will work.

ju6ge avatar Jun 03 '24 12:06 ju6ge

One additional point: this repository already depends on commits prior to polars-0.40.0. 0.40.0 had a fatal bug that prevented it from being an R package unless the prior commits were included polars is mainly developed for the Python package, and the Python packages always depend on the Polars crate on GitHub, not Polars on crates.io, so we are convinced that it is very difficult to depend on Polars on crates.io. So I don't think it is worth it to publish to crates.io with such disadvantages.

I don't see the benefit of publishing to crates.io, as it is not at all a problem to rely on Rust crates on GitHub when using Rust source code in an R package. (If submitted to CRAN, downloads from crates.io are not allowed as like downloads from GitHub, and all crates must be vendored)

eitsupi avatar Jun 03 '24 12:06 eitsupi

Thanks for this detailed plan! This looks good to me, feel free to open a PR for 1) as it's the easiest one.

Regarding 2), this might take some time (which is not a big deal since we have to wait for extendr to address 3) anyway). We updated to rust-polars 0.40.0 a couple of weeks ago but we decided to go forward and to include a more bleeding-edge version of rust-polars since there were many important fixes added right after 0.40.0. Therefore, we'll see when rust-polars 0.40.1 or 0.41.0 is released, which could take a month or more.


Edit: given @eitsupi message above (published at the same time), maybe wait before proceeding with 1)

etiennebacher avatar Jun 03 '24 12:06 etiennebacher

polars is mainly developed for the Python package, and the Python packages always depend on the Polars crate on GitHub, not Polars on crates.io, so we are convinced that it is very difficult to depend on Polars on crates.io.

Not sure I understand @eitsupi. Most of the time, we use the Github commit of rust-polars which is associated to the Github release (and therefore to the new version on crates.io). There were a few occasions (like the current state of the repo) where we use a commit that is not associated with a github release but that is rare. So in most occasions, what we do is equivalent to using the crates.io versions, no?

etiennebacher avatar Jun 03 '24 12:06 etiennebacher

So in most occasions, what we do is equivalent to using the crates.io versions, no?

Yes, of course it is. But I think the patch is actually applied when it is released to crates.io. I have tried in the past to reference crates.io instead of the GitHub commit, but for some reason I made it into the build. In short, I don't think polars on crates.io are guaranteed to be available.

eitsupi avatar Jun 03 '24 12:06 eitsupi

@eitsupi I do not understand the sentiment that polars would be unavailible on crates.io. I have a code that depends on polars for at least two years now and is has not been a problem. If you always want the newest commit then of course they are not availible on crates.io. But developers version their releases for a reason and stable versions are always available on crates.io for downstream users of a library it make a lot of sense to use the stable version. If a new version breaks something critical nobody forces you to upgrade until it is fixed.

ju6ge avatar Jun 03 '24 12:06 ju6ge

@ju6ge Most of Polars tests are written on the Python side, and I feel that Rust Polars is not stable at all.

I think this is evident from the fact that Python Polars will soon be 1.0.0, whereas most developers do not pay attention to the version of Rust Polars (Breaking changes unmarked breaking changes are frequently introduced).

Also, extendr and libR-sys are rarely released on crates.io, so I think it would be very hard for developers to stick to the version on crates.io.

eitsupi avatar Jun 03 '24 13:06 eitsupi

Again, I don't understand the need to register r-polars on crates.io even if all dependencies can be downloaded from crates.io. What's wrong with downloading from GitHub?

eitsupi avatar Jun 03 '24 13:06 eitsupi

@eitsupi It is about the taint that is introduced because of the git import. Any library that is written using a git import can not be published. So if I write a library that has an optional feature flag to enable r-bindings and want to export RPolarsDataFrames. If I import r-polars via git I am prohibited from publishing it even if all other code works just with stable versions just fine. So this is a transitive requirement that gets in the way. Crates.io is the way all rust dependencies are distributed officially.

I agree that it might be annoying, if rust polars abi is unstable it does not matter if you use the git version or the crates.io version. The amount of work to stay in sync will be the same in the long run.

ju6ge avatar Jun 03 '24 13:06 ju6ge

Crates.io is the way all rust dependencies are distributed officially.

In the Rust world, yes, but not in the CRAN-centered R world. https://cran.r-project.org/web/packages/using_rust.html

In the R world, I don't see the value in releasing to crates.io at this time.

eitsupi avatar Jun 03 '24 13:06 eitsupi

I'm sure I've argued the same thing many times, but I welcome contributions to extendr-polars. Why not just update extendr-polars with a dependency on polars on crates.io like pyo3-polars?

eitsupi avatar Jun 03 '24 13:06 eitsupi

I agree that it might be annoying, if rust polars abi is unstable it does not matter if you use the git version or the crates.io version. The amount of work to stay in sync will be the same in the long run.

It is a really painful task to fix a large number of breaking changes in Rust Polars in one PR, which is released only once a month. So I definitely do not want to commit to that.

eitsupi avatar Jun 03 '24 13:06 eitsupi

I'm sure I've argued the same thing many times, but I welcome contributions to extendr-polars. Why not just update extendr-polars with a dependency on polars on crates.io like pyo3-polars?

So you are arguing it is easier to update two seperate code bases instead of a single codebase?

All that is needed to export to R from rust is already here. extendr_polars even depends on r-polars by requiring a working R session with r-polars installed.

It is a really painful task to fix a large number of breaking changes in Rust Polars in one PR, which is released only once a month. So I definitely do not want to commit to that.

I get that, and I get your time is limited. I do also want to express my respect for the work that has gone into this project already.

ju6ge avatar Jun 03 '24 13:06 ju6ge

So you are arguing it is easier to update two seperate code bases instead of a single codebase?

Good point. Ideally, I would prefer to move the extendr hack in r-polars to extendr-polars and have r-polars depend on extendr-polars. However, it takes a lot of work to do that, so for now we feel it is easier to put a copy in extendr-polars.

In long-term, I expect the codebase to be simpler with savvy than with extendr. (#1126)

Currently there is code on the Rust and R sides to use the Result type on extendr, and these are needed for extendr-polars (or something like that).

eitsupi avatar Jun 03 '24 14:06 eitsupi

I just created #1135. Which addresses enabling downstream projects to import r-polars as a rust dependency.

ju6ge avatar Jun 06 '24 12:06 ju6ge