param icon indicating copy to clipboard operation
param copied to clipboard

Improve DataFrame parameter to allow compatible instance types outside pandas (e.g. polars)

Open phi6ias opened this issue 2 years ago • 1 comments

Request

Please improve the DataFrame parameter so that other DataFrame types are supported, in particular polars.

Motivation

Polars is intentionally similar to Pandas, but aims to improve on performance. This is also the main reason for using it in combination with better arrows support. https://www.pola.rs/ Polars API Reference

Discussion

At the moment the DataFrame parameter is siloed into pandas, even though polars should be a drop in replacement without introducing bugs. Is there no better way of checking that a DataFrame is supplied (is default.__name__ not sufficient as a check)?

Code Proposal

A small change like this may be enough?

https://github.com/holoviz/param/blob/ddcd5f36bcff1712f2c7fb984268a0ad6dc9649f/param/init.py#L209

            if 'polars' in sys.modules:
                from polars import (
                    DataFrame as plDFrame, Series as plSeries
                )
                if isinstance(v, plDFrame):
                    params[k] = DataFrame(**kws)
                    continue
                elif isinstance(v, plSeries):
                    params[k] = Series(**kws)
                    continue
            params[k] = Parameter(**kws)

https://github.com/holoviz/param/blob/ddcd5f36bcff1712f2c7fb984268a0ad6dc9649f/param/init.py#L1518

    def __init__(self, default=None, rows=None, columns=None, ordered=None, **params):
        from pandas import DataFrame as pdDFrame
        from polars import DataFrame as plDFrame
        if isinstance(default, pdDFrame):
            dfClass = pdDFrame
        elif ifinstance(default, plDFrame):
            dfClass = plDFrame
        else:
            raise ValueError("Value supplied for DataFrame parameter is not a valid type of DataFrame.")
        self.rows = rows
        self.columns = columns
        self.ordered = ordered
        super(DataFrame,self).__init__(dfClass, default=default, **params)
        self._validate(self.default)

I'm not advanced enough to wing the deserialize function. Is it as simple as checking for the type of the variable cls before deciding to return a pandas or polars DataFrame ?

https://github.com/holoviz/param/blob/ddcd5f36bcff1712f2c7fb984268a0ad6dc9649f/param/init.py#L1576

    @classmethod
    def deserialize(cls, value):
        if value == 'null':
            return None
        if cls.__module__.split('.')[0] == 'pandas':
            from pandas import DataFrame as dFrame
        elif cls.__module__.split('.')[0] == 'polars':
            from polars import DataFrame as dFrame
        else:
            raise ValueError(
                "Cannot Deserialise:  Value supplied for DataFrame parameter is not a valid type of DataFrame."
            )
        return dFrame(value)

P.S.:

The same applies for Series...

phi6ias avatar Nov 10 '22 21:11 phi6ias

That's a tricky one. I don't think a Polars object should be accepted by default, because people who have written Param code that expects a Pandas dataframe would be surprised that Param allowed such a parameter to be set to a non-Pandas dataframe, even if it has roughly the same API, unless it's absolutely guaranteed to have the same API. But I could imagine the Parameter having a list of supported DataFrame types, defaulting to just pandas but able to be declared to include multiple types. Happy to see a PR with roughly the code above but able to enable or disable accepting Polars!

jbednar avatar Nov 13 '22 00:11 jbednar