skpro icon indicating copy to clipboard operation
skpro copied to clipboard

[ENH] Enhancing polars support by introducing `set_output`

Open julian-fong opened this issue 8 months ago • 16 comments

Introduces files set_output inside skpro.utils and new tests file test_set_output inside the tests folder. As part of https://github.com/sktime/enhancement-proposals/pull/34 and the notes written in my mentorship programme .

In this pr:

  • I have introduced basic functions to convert multi column pandas dataframes into single column pandas dataframes and vice versa, these are stored under the polars adapter file skpro.datatypes._adapter.polars. In the polars adapter file, convert_polars_to_pandas_with_index now checks to see if there was melted multi-index columns (these columns will be denoated via "foo__bar" convention) and convert_pandas_to_polars_with_index now checks to see if there are multi-index columns inside the pandas DataFrame (like in predict_interval and predict_quantile. If so, then we will melt down these multi-index columns into single-level columns before converting into a polars dataframe.
  • created skpro.utils.set_output.check_output_config to ensure that transformations set by the users are aligned with available skpro output data containers.
  • created skpro.utils.set_output.transform_output in order to convert the resulting DataFrame into user specified or default data containers. transform_output acts like a wrapper around the ordinary convert function, but instead it also checks whether to convert based upon the user specified mtype or to leverage the default original mtype seen in fit
  • I have introduced _config inside BaseProbaRegressor and a new function set_output which mirrors sklearn's set_output for familiarity.

Relates to #342 #449

julian-fong avatar Jun 21 '24 18:06 julian-fong