polars
polars copied to clipboard
Julia interface for polars
Describe your feature request
Has there been any consideration towards a Julia package for polars? I see that you have python and js in there.
I would be happy to contribute too
Hi. Anything that integrates easily with Rust could be used as a front-end, and I'd happily help with modifying the Rust native parts so that they fit better.
However, from the maintenance side, I don't have time to support anymore front-ends. For the JS one, I actually hope to get some help from people more experienced with JS.
So, its possible, but I would need help. :smile:
Alright, I've looked into this a little bit and I've found some resources that can help
This is an example of Rust code being called from Julia https://github.com/felipenoris/JuliaPackageWithRustDep.jl It looks like Rust's FFI allows Julia to call it via C.
Where is the native part of polars? Does it have c bindings like the example in the repo?
Quick update: looks like c bindings can be generated for any rust crate: https://github.com/felipenoris/JuliaPackageWithRustDep.jl/blob/master/deps/RustDylib/build/build.rs
Also I'm working to implement the C Data interface in Arrow.jl, which would be a prerequisite to get this working
That sounds promising. Let me know if you need anything from me!
I'm building a proof of concept in this repo: https://github.com/sa-/Polars.jl This is a rough to do list
- [ ] get the C data interface working in Arrow.jl
- [ ] implement
jl_to_rsandrs_to_jl - [ ] Allow Polars.jl to accept Arrow.Table as an argument to any function, and call join/groupby etc from there
Cool, I will follow this with interest
Allow Polars.jl to accept Arrow.Table as an argument to any function, and call join/groupby etc from there
Do you plan on making DataFrame/ Series struct/objects similar to the api in rust / python?
FYI: https://docs.rs/jlrs/0.10.0/jlrs/ : provides a reasonably safe interface to the Julia C API that lets you call code written in Julia from Rust and vice versa. Currently this crate is only tested on Linux in combination with Julia 1.6 and is not compatible with earlier versions of Julia.
FYI: https://docs.rs/jlrs/0.10.0/jlrs/ : provides a reasonably safe interface to the Julia C API that lets you call code written in Julia from Rust and vice versa. Currently this crate is only tested on Linux in combination with Julia 1.6 and is not compatible with earlier versions of Julia.
Nice, I think I will add a help wanter banner for this one. I can help in integration, but maintaining 2 API have turned out to be my max capacity. :sweat_smile:
I'm a long time Julia veteran and I might be interested in working on this. I think there would be very significant benefits to having a polars package in Julia especially since it fills a use case which is arguably not a design goal of any current Julia package. For example, while I personally think DataFrames.jl is fantastic and may be more often useful to Julia users than a Polars.jl, it will never implement query optimization, nor does it prioritize performance to the degree that polars does as there is significant hedging in favor of convenience (i.e. it is not type stable).
More importantly to me personally, the deserialization options available in Julia have always been a bit lackluster, and it is frankly not the sort of thing that Julia users tend to want to maintain. If we can get a polars wrapper going without any big sacrifices to efficiency via the wrapper it would ensure that we have all of the deserialization options of polars permanently available in Julia.
It should be easy for me to figure out how to compile polars into a C *.so, most of the work figuring out how to get started is likely to be figuring out what C calls are available in a particular library just from looking at the rust documentation. So far it looks like I might have to do a lot of digging through the polars source code to figure that out.
I'll do some digging around and see if this is something I want to take on.
Just to add a further point to @ExpandingMan comment.
Another big advantage I see in having Polars.jl is to be able to use the same dataframes library in Python and Julia.
I work in a team that makes heavy usage of Python and Julia. Having access to the same tools across languages is a huge advantage for the team.
I was feeling very enthusiastic about this initially, but after a lot of struggling I can reach no other conclusion than that creating an FFI for rust sucks. It necessarily involves breaking all of rusts memory safety guarantees, not to mention that now you have both rust's deinitialization mechanisms and the Julia GC trying to free everything (everywhere, all the time), so it's a bit of a double-free fest.
I suspect this would be drastically easier for someone who is a lot more familiar with rust. I don't have a lot of experience with it and I found this attempt much more frustrating than writing "normal" rust code.
It might be worth taking more cues from the python wrapper, however that seems to make extensive use of pyo3. The fact that pyo3 seems like a major undertaking in itself really does not bode well for how much effort it would be to create a Julia wrapper for polars.
This seems very doable. The hard part will be manually writing out every single FFI function, but after that it should be straight forward.
For example, DataFrame is defined here. An easy implementation is here The FFI for this would look like
#[no_mangle]
pub extern "C" fn polars_dataframe_new() -> *mut c_void {
// Create the Rust struct
let d = DataFrame::new();
// Get a pointer to it as a mutable reference
let pointer = &mut DataFrame as *mut c_void
// Forget the Rust struct so RAII doesn't deallocate it
std::mem::forget(d);
// Return the pointer
pointer
}
#[no_mangle]
pub extern "C" fn polars_dataframe_estimated_size(pointer: *mut c_void) -> c_int {
// Dereference the pointer (note, it's not this simple)
let d = *pointer;
// Run the impl function
let res = d.estimated_size()
// Forget the Rust struct again
std::mem::forget(d);
// Return the value
pointer as c_int
}
#[no_mangle]
pub extern "C" fn polars_dataframe_free(pointer: *mut c_void) {
// Safety checks to make sure the object is valid can be done here
todo!();
// This will deallocate at the end of the scope
*pointer
}
Please note that I have not done type checking on this code as was done off the top of my head. This is generally what it will look like. Where it gets complicated is traits, but that can be abstracted through FFI like this with a bit of work.
A pointer can be stored in a Julia struct or type, and it can be given a finalize function that will call polars_dataframe_free. That way Julia's GC can clean Rust objects while still maintaining Rust's safety and speed.
This will also be beneficial for other languages such as C, Go and Swift as they can all use extern "C" functions natively.
I will be gone for the next 2 years, but after that I hope to look into this situation when I get home. I hope something can be done with this, as Polars would be useful for any of the LLVM cousins.
I will be gone for the next 2 years, but after that I hope to look into this situation when I get home
That's some long-term planning! I barely know what I'm doing two weeks into the future ;)
I wonder, couldn’t a similar approach to what RustFFT.jl has done be possible for making polars available to Julia?
I wonder, couldn’t a similar approach to what RustFFT.jl has done be possible for making polars available to Julia?
The problem is that polars does not already have a fully fleshed out general foreign function interface. This is understandable since creating that is a complicated job. The python package for example uses a lot of very python-specific code, it was much more difficult to use as a reference than I was hoping.
I started a Polars.jl frontend (undergoing registration). It is still quite primitive but I think we should be able to get a pretty good level of interopability with Julia. In the process, a C-API is defined in the c-polars folder for which we provide built dynamic libraries and a header file if some C/C++ users are potentially interested (though its main goal is bridging with Julia).
as discussed today, this issue can be closed, as there aren't plans to implement a Julia interface in the Polars repo itself, and other repos like https://github.com/Pangoraw/Polars.jl have already been started, so discussions can move to those. thanks all for comments!