polars icon indicating copy to clipboard operation
polars copied to clipboard

feat(rust): Implement Try/TryFrom for ndarrary conversion

Open multimeric opened this issue 4 years ago • 16 comments

Motivation

Polars types (DataFrame etc) can be converted to ndarray types, and indeed this has already been implemented. What isn't implemented is a nice trait that allows us to write functions that polymorphically accept a DataFrame in the same place as an Array. The obvious candidate is the Try/From family of traits since they do just this.

Summary

  • Implemented
    • From<ChunkedArray> for ArrayView1<>
    • TryFrom<DataFrame> for Array2<> (these can return errors so I couldn't just use From)
    • TryFrom<ListChunked> for Array2<>
  • Moved the ndarray module into polars_core instead of polars_core::chunked_array because that made no sense as the module involves several types such as DataFrame
  • Added a test for DataFrameArray conversion. I don't believe any existed before this.
  • Closes #1019

Concerns

It ended up being the case that I couldn't just write a generic implementation of TryFrom<DataFrame> for Array2<T>, and instead had to do some pretty ugly macro hackery instead. The reason for this is that the below signature doesn't constraint N because, for example when converting to an array of f64, there could be any number of N that have N::Native = f64. I'm of course open to an elegant solution if anyone can think of one but I'm totally stumped.

impl<N> From<DataFrame> for Array2<N::Native>
    where
        N: PolarsNumericType,
        N::Native: num::Zero + Copy

multimeric avatar Aug 05 '21 14:08 multimeric

Hey @multimeric, Thanks for the PR.

Regarding the trait implementations. You don't have to be bound on PolarsNumeric traits here.

Something like this would work:

impl<T> From<DataFrame> for Array2<T>
    where
        T: num::NumCast

The same counts for ListChunked.

I want this to be generic because it doesn't increase polars compilation time until actually implemented.

ritchie46 avatar Aug 05 '21 14:08 ritchie46

Hmm but the to_ndarray() relies on series.cast::<N>()?, where N: PolarsNumericType.

Can you briefly show me how I can convert a Series (or Data Frame) to any kind of primitive without first going via a PolarsNumericType?

multimeric avatar Aug 05 '21 15:08 multimeric

We could use the iterators for that, but that would be unnecessary code bloat. I think it would be better to have a conversion trait from primitive like f64, i32 to PolarsNumericType

ritchie46 avatar Aug 09 '21 05:08 ritchie46

Are you suggesting that we implement From<T> for PolarsNumericType for primitives T, so that I can implement impl<T> From<DataFrame> for Array2<T> where T: Into<PolarsNumericType> and I use that first trait in the implementation? The problem is that I need access to the type (ie Float64Type) so that I can use it in to_ndarray, and not an instance of Float64Type. I'm not sure if this would work. Also regarding conversion from primitives, there's all this AnyValue<T> stuff in the codebase that seems to do a lot of this, but I have no idea how to use it.

multimeric avatar Aug 09 '21 06:08 multimeric

Are you suggesting that we implement From<T> for PolarsNumericType for primitives T, so that I can implement impl<T> From<DataFrame> for Array2<T> where T: Into<PolarsNumericType> and I use that first trait in the implementation? The problem is that I need access to the type (ie Float64Type) so that I can use it in to_ndarray, and not an instance of Float64Type. I'm not sure if this would work. Also regarding conversion from primitives, there's all this AnyValue<T> stuff in the codebase that seems to do a lot of this, but I have no idea how to use it.

I will come back to you with a small example.

ritchie46 avatar Aug 09 '21 08:08 ritchie46

That would be much appreciated.

multimeric avatar Aug 09 '21 08:08 multimeric

Sorry @multimeric , I will come back to this, a lot to do.

ritchie46 avatar Aug 16 '21 10:08 ritchie46

@multimeric #1192 allows you to get a PolarsDataType at compile time. That can be used to cast Series.

ritchie46 avatar Aug 22 '21 06:08 ritchie46

Great, thanks! I'll take a look.

multimeric avatar Aug 23 '21 04:08 multimeric

Isn't your PR just implementing the same exhaustive list of conversions using a macro like I did?

multimeric avatar Aug 31 '21 11:08 multimeric

Also I still don't see how to use this. For the conversion to work, the compiler needs to understand the link between polars types and primitive types at compile time, ie using an associated type or trait. But #1192 doesn't do that, instead it lets you obtain a polars type at runtime.

At it's heart the conversion to ndarray relies on series.cast::<N>(). It's not possible to use T::to_polars_type() for this purpose. For example:

impl<D, T> TryFrom<DataFrame> for ArrayBase<D, Ix2>
    where
        D: Data<Elem=T>,
        T: ToPolarsType
{
    type Error = PolarsError;
    fn try_from(d: DataFrame) -> Result<Self> {
        let polars_type = T::to_polars_type();
        d.to_ndarray::<polars_type>() // <----- Not possible
    }
}

multimeric avatar Aug 31 '21 11:08 multimeric

I am planning on rewriting the trait system that will better fit this problem as well.

In the mean time. If you need the Try/TryFrom for an API you make, you can make custom traits with the same behavior. (I thought I read on SO you wanted to do this:) )

And sorry for the delay. I just want to reduce edit: compile bloat, as it is already very strained.

ritchie46 avatar Sep 28 '21 05:09 ritchie46

Thanks. Yes my current workaround is writing a custom conversion trait and implementing it for Array and for DataFrame, but it's a slightly sub-optimal solution because I would like to support any type that knows how to convert itself into an Array, not just those that I have personally implemented.

multimeric avatar Sep 28 '21 05:09 multimeric

Thanks. Yes my current workaround is writing a custom conversion trait and implementing it for Array and for DataFrame, but it's a slightly sub-optimal solution because I would like to support any type that knows how to convert itself into an Array, not just those that I have personally implemented.

Perhaps you could implement your trait for any T that implements that conversion to ndarray? That's for instance how itertools creates custom iterator adapters, by creating their own trait and implementing it for any T: Iterator

ritchie46 avatar Sep 28 '21 19:09 ritchie46

Perhaps you could implement your trait for any T that implements that conversion to ndarray? That's for instance how itertools creates custom iterator adapters, by creating their own trait and implementing it for any T: Iterator

True, so I have a workaround in the meantime, but ideally I would like to not make an exception for DataFrame, and ideally not need my own trait for this at all.

Have you made any relevant changes to the type system that might affect this, or should I rebase my macro approach so that it can be merged?

multimeric avatar Jan 15 '22 06:01 multimeric

        d.to_ndarray::<polars_type>() // <----- Not possible

Hi @multimeric , I actually needed this so I implemented it myself. It not be optimal but it works with the traits and I think you could integrate it with your TryFrom.

pub fn ndarray_to_df<T, D: Dimension>(
    arr: &ArrayBase<OwnedRepr<T>, D>,
    col_names: Vec<&str>,
) -> PolarsResult<DataFrame>
where
    T: NumericNative + FromPrimitive + ToPrimitive,
{
    let mut lanes: Vec<Series> = vec![];

This is the link to the snippet https://github.com/mithril-security/bastionlab/blob/linfa-integration/server/bastionlab_linfa/src/operations.rs#L26

kbamponsem avatar Jan 16 '23 11:01 kbamponsem

I'm going to close this as it has been 2 years since the PR was first opened, and code conflicts have piled up.

Feel free to rebase and reopen the PR if you want to continue this work.

stinodego avatar Aug 09 '23 17:08 stinodego