skrub icon indicating copy to clipboard operation
skrub copied to clipboard

InterpolationJoiner - polars

Open zbenmo opened this issue 1 year ago • 1 comments

Describe the bug

Tried a simple join as follows:

joiner = InterpolationJoiner( data_store["depth_0"][0], key=["case_id"], suffix="_depth_0", ).fit(data_store["df_base"]) join = joiner.transform(data_store["df_base"]) join.head()

--

data_store["depth_0"][0] - polars Dataframe data_store["df_base"] - polars Dataframe

--

Steps/Code to Reproduce

joiner = InterpolationJoiner( data_store["depth_0"][0], key=["case_id"], suffix="_depth_0", ).fit(data_store["df_base"]) join = joiner.transform(data_store["df_base"]) join.head()

Expected Results

Wanted to see the join, as in: https://skrub-data.org/stable/auto_examples/09_interpolation_join.html

Actual Results


KeyError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/polars/_utils/deprecation.py:95, in deprecate_parameter_as_positional..decorate..wrapper(*args, **kwargs) 94 try: ---> 95 param_args = kwargs.pop(old_name) 96 except KeyError:

KeyError: 'columns'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) Cell In[8], line 5 1 joiner = InterpolationJoiner( 2 data_store["depth_0"][0], 3 key=["case_id"], 4 suffix="_depth_0", ----> 5 ).fit(data_store["df_base"]) 6 join = joiner.transform(data_store["df_base"]) 7 join.head()

File /opt/conda/lib/python3.10/site-packages/skrub/_interpolation_joiner.py:225, in InterpolationJoiner.fit(failed resolving arguments) 223 _join_utils.check_missing_columns(X, self.main_key, "'X' (the main table)") 224 key_values = self.vectorizer.fit_transform(self.aux_table[self._aux_key]) --> 225 estimators = self._get_estimator_assignments() 226 fit_results = joblib.Parallel(self.n_jobs)( 227 joblib.delayed(_fit)( 228 key_values, (...) 233 for assignment in estimators 234 ) 235 fit_results = self._check_fit_results(fit_results)

File /opt/conda/lib/python3.10/site-packages/skrub/_interpolation_joiner.py:356, in InterpolationJoiner._get_estimator_assignments(self) 339 def _get_estimator_assignments(self): 340 """Identify column groups to be predicted together and assign them an estimator. 341 342 In many cases, a single estimator cannot handle all the target columns. (...) 354 separately to each column. 355 """ --> 356 aux_table = self.aux_table.drop(self._aux_key, axis=1) 357 assignments = [] 358 regression_table = aux_table.select_dtypes("number")

File /opt/conda/lib/python3.10/site-packages/polars/_utils/deprecation.py:97, in deprecate_parameter_as_positional..decorate..wrapper(*args, **kwargs) 95 param_args = kwargs.pop(old_name) 96 except KeyError: ---> 97 return function(*args, **kwargs) 99 issue_deprecation_warning( 100 f"named {old_name} param is deprecated; use positional *args instead.", 101 version=version, 102 ) 103 if not isinstance(param_args, Sequence) or isinstance(param_args, str):

TypeError: DataFrame.drop() got an unexpected keyword argument 'axis'

Versions

'0.1.0'

zbenmo avatar Apr 01 '24 16:04 zbenmo

thanks for reporting this bug. Indeed, InterpolationJoiner does not yet have support for polars, although that should be added soon. in the meanwhile it should be documented and provide a better error message

  • [ ] document the fact that interpolationjoiner is missing polars support ATM
  • [ ] add polars support

jeromedockes avatar Apr 22 '24 13:04 jeromedockes