skrub icon indicating copy to clipboard operation
skrub copied to clipboard

[WIP] Interpolationjoiner dataframe api

Open jeromedockes opened this issue 2 years ago • 7 comments

this changes the InterpolationJoiner to rely on the dataframe api (and some utilities added to skrub._dataframe) so that it works with polars

the tests are now parametrized with a fixture px that becomes pandas and polars

jeromedockes avatar Nov 15 '23 14:11 jeromedockes

In the utilities we added to skrub._dataframe, I believe make_dataframe, make_series, join, could use the dataframe api instead

jeromedockes avatar Nov 15 '23 15:11 jeromedockes

and it seems our oldest supported pandas version does not support the dataframe api?

jeromedockes avatar Nov 15 '23 15:11 jeromedockes

I bumped the pandas version just to see if the CI runs but having the dataframe api requires pandas 2.1.0 release notes which dates from august 2023

jeromedockes avatar Nov 16 '23 15:11 jeromedockes

@MarcoGorelli in case you have the time I'm sure you would have advice for better use of the dataframe API in this one!

jeromedockes avatar Nov 17 '23 10:11 jeromedockes

ooh, seeing you try this out has made my day! got some things I need to finish now but I'll take a careful look and see what we need to change upstream (I'm sure something will come up 😄 )

MarcoGorelli avatar Nov 17 '23 14:11 MarcoGorelli

ooh, seeing you try this out has made my day!

I'm sooo happy about this PR, Marco! I love the way the support for polars is building in skrub

GaelVaroquaux avatar Nov 17 '23 14:11 GaelVaroquaux

If the dataframe-api-compat is not mature enough yet, it might be wiser to wait rather than to have to change a lot of logic three months later

FWIW I'm aiming to tag the first non-beta version by February https://github.com/data-apis/dataframe-api/issues/319. Til then, I'm extremely happy if people experiment with it, but I would caution against putting a lot of work into using it

MarcoGorelli avatar Nov 17 '23 17:11 MarcoGorelli

after all we won't be using the dataframe API for this so it will be easier to just start a new branch

jeromedockes avatar May 28 '24 15:05 jeromedockes