param
param copied to clipboard
Improve DataFrame parameter to allow compatible instance types outside pandas (e.g. polars)
Request
Please improve the DataFrame parameter so that other DataFrame types are supported, in particular polars.
Motivation
Polars is intentionally similar to Pandas, but aims to improve on performance. This is also the main reason for using it in combination with better arrows support. https://www.pola.rs/ Polars API Reference
Discussion
At the moment the DataFrame parameter is siloed into pandas, even though polars should be a drop in replacement without introducing bugs. Is there no better way of checking that a DataFrame is supplied (is default.__name__
not sufficient as a check)?
Code Proposal
A small change like this may be enough?
https://github.com/holoviz/param/blob/ddcd5f36bcff1712f2c7fb984268a0ad6dc9649f/param/init.py#L209
if 'polars' in sys.modules:
from polars import (
DataFrame as plDFrame, Series as plSeries
)
if isinstance(v, plDFrame):
params[k] = DataFrame(**kws)
continue
elif isinstance(v, plSeries):
params[k] = Series(**kws)
continue
params[k] = Parameter(**kws)
https://github.com/holoviz/param/blob/ddcd5f36bcff1712f2c7fb984268a0ad6dc9649f/param/init.py#L1518
def __init__(self, default=None, rows=None, columns=None, ordered=None, **params):
from pandas import DataFrame as pdDFrame
from polars import DataFrame as plDFrame
if isinstance(default, pdDFrame):
dfClass = pdDFrame
elif ifinstance(default, plDFrame):
dfClass = plDFrame
else:
raise ValueError("Value supplied for DataFrame parameter is not a valid type of DataFrame.")
self.rows = rows
self.columns = columns
self.ordered = ordered
super(DataFrame,self).__init__(dfClass, default=default, **params)
self._validate(self.default)
I'm not advanced enough to wing the deserialize
function. Is it as simple as checking for the type of the variable cls
before deciding to return a pandas or polars DataFrame ?
https://github.com/holoviz/param/blob/ddcd5f36bcff1712f2c7fb984268a0ad6dc9649f/param/init.py#L1576
@classmethod
def deserialize(cls, value):
if value == 'null':
return None
if cls.__module__.split('.')[0] == 'pandas':
from pandas import DataFrame as dFrame
elif cls.__module__.split('.')[0] == 'polars':
from polars import DataFrame as dFrame
else:
raise ValueError(
"Cannot Deserialise: Value supplied for DataFrame parameter is not a valid type of DataFrame."
)
return dFrame(value)
P.S.:
The same applies for Series...
That's a tricky one. I don't think a Polars object should be accepted by default, because people who have written Param code that expects a Pandas dataframe would be surprised that Param allowed such a parameter to be set to a non-Pandas dataframe, even if it has roughly the same API, unless it's absolutely guaranteed to have the same API. But I could imagine the Parameter having a list of supported DataFrame types, defaulting to just pandas but able to be declared to include multiple types. Happy to see a PR with roughly the code above but able to enable or disable accepting Polars!