polars
polars copied to clipboard
feat(rust): Implement Try/TryFrom for ndarrary conversion
Motivation
Polars types (DataFrame etc) can be converted to ndarray types, and indeed this has already been implemented. What isn't implemented is a nice trait that allows us to write functions that polymorphically accept a DataFrame in the same place as an Array. The obvious candidate is the Try/From family of traits since they do just this.
Summary
- Implemented
From<ChunkedArray> for ArrayView1<>TryFrom<DataFrame> for Array2<>(these can return errors so I couldn't just useFrom)TryFrom<ListChunked> for Array2<>
- Moved the
ndarraymodule intopolars_coreinstead ofpolars_core::chunked_arraybecause that made no sense as the module involves several types such asDataFrame - Added a test for
DataFrame→Arrayconversion. I don't believe any existed before this. - Closes #1019
Concerns
It ended up being the case that I couldn't just write a generic implementation of TryFrom<DataFrame> for Array2<T>, and instead had to do some pretty ugly macro hackery instead. The reason for this is that the below signature doesn't constraint N because, for example when converting to an array of f64, there could be any number of N that have N::Native = f64. I'm of course open to an elegant solution if anyone can think of one but I'm totally stumped.
impl<N> From<DataFrame> for Array2<N::Native>
where
N: PolarsNumericType,
N::Native: num::Zero + Copy
Hey @multimeric, Thanks for the PR.
Regarding the trait implementations. You don't have to be bound on PolarsNumeric traits here.
Something like this would work:
impl<T> From<DataFrame> for Array2<T>
where
T: num::NumCast
The same counts for ListChunked.
I want this to be generic because it doesn't increase polars compilation time until actually implemented.
Hmm but the to_ndarray() relies on series.cast::<N>()?, where N: PolarsNumericType.
Can you briefly show me how I can convert a Series (or Data Frame) to any kind of primitive without first going via a PolarsNumericType?
We could use the iterators for that, but that would be unnecessary code bloat. I think it would be better to have a conversion trait from primitive like f64, i32 to PolarsNumericType
Are you suggesting that we implement From<T> for PolarsNumericType for primitives T, so that I can implement impl<T> From<DataFrame> for Array2<T> where T: Into<PolarsNumericType> and I use that first trait in the implementation? The problem is that I need access to the type (ie Float64Type) so that I can use it in to_ndarray, and not an instance of Float64Type. I'm not sure if this would work. Also regarding conversion from primitives, there's all this AnyValue<T> stuff in the codebase that seems to do a lot of this, but I have no idea how to use it.
Are you suggesting that we implement
From<T> for PolarsNumericTypefor primitivesT, so that I can implementimpl<T> From<DataFrame> for Array2<T> where T: Into<PolarsNumericType>and I use that first trait in the implementation? The problem is that I need access to the type (ieFloat64Type) so that I can use it into_ndarray, and not an instance ofFloat64Type. I'm not sure if this would work. Also regarding conversion from primitives, there's all thisAnyValue<T>stuff in the codebase that seems to do a lot of this, but I have no idea how to use it.
I will come back to you with a small example.
That would be much appreciated.
Sorry @multimeric , I will come back to this, a lot to do.
@multimeric #1192 allows you to get a PolarsDataType at compile time. That can be used to cast Series.
Great, thanks! I'll take a look.
Isn't your PR just implementing the same exhaustive list of conversions using a macro like I did?
Also I still don't see how to use this. For the conversion to work, the compiler needs to understand the link between polars types and primitive types at compile time, ie using an associated type or trait. But #1192 doesn't do that, instead it lets you obtain a polars type at runtime.
At it's heart the conversion to ndarray relies on series.cast::<N>(). It's not possible to use T::to_polars_type() for this purpose. For example:
impl<D, T> TryFrom<DataFrame> for ArrayBase<D, Ix2>
where
D: Data<Elem=T>,
T: ToPolarsType
{
type Error = PolarsError;
fn try_from(d: DataFrame) -> Result<Self> {
let polars_type = T::to_polars_type();
d.to_ndarray::<polars_type>() // <----- Not possible
}
}
I am planning on rewriting the trait system that will better fit this problem as well.
In the mean time. If you need the Try/TryFrom for an API you make, you can make custom traits with the same behavior. (I thought I read on SO you wanted to do this:) )
And sorry for the delay. I just want to reduce edit: compile bloat, as it is already very strained.
Thanks. Yes my current workaround is writing a custom conversion trait and implementing it for Array and for DataFrame, but it's a slightly sub-optimal solution because I would like to support any type that knows how to convert itself into an Array, not just those that I have personally implemented.
Thanks. Yes my current workaround is writing a custom conversion trait and implementing it for Array and for DataFrame, but it's a slightly sub-optimal solution because I would like to support any type that knows how to convert itself into an Array, not just those that I have personally implemented.
Perhaps you could implement your trait for any T that implements that conversion to ndarray? That's for instance how itertools creates custom iterator adapters, by creating their own trait and implementing it for any T: Iterator
Perhaps you could implement your trait for any T that implements that conversion to ndarray? That's for instance how itertools creates custom iterator adapters, by creating their own trait and implementing it for any T: Iterator
True, so I have a workaround in the meantime, but ideally I would like to not make an exception for DataFrame, and ideally not need my own trait for this at all.
Have you made any relevant changes to the type system that might affect this, or should I rebase my macro approach so that it can be merged?
d.to_ndarray::<polars_type>() // <----- Not possible
Hi @multimeric , I actually needed this so I implemented it myself. It not be optimal but it works with the traits and I think you could integrate it with your TryFrom.
pub fn ndarray_to_df<T, D: Dimension>(
arr: &ArrayBase<OwnedRepr<T>, D>,
col_names: Vec<&str>,
) -> PolarsResult<DataFrame>
where
T: NumericNative + FromPrimitive + ToPrimitive,
{
let mut lanes: Vec<Series> = vec![];
This is the link to the snippet https://github.com/mithril-security/bastionlab/blob/linfa-integration/server/bastionlab_linfa/src/operations.rs#L26
I'm going to close this as it has been 2 years since the PR was first opened, and code conflicts have piled up.
Feel free to rebase and reopen the PR if you want to continue this work.