Jérôme Dockès
Jérôme Dockès
> - i. I'm +1 for sampling random rows and computing a sufficient statistic to get the reference distance. This would represent the "global distance" between the left row and...
I think this has been addressed in #821
let's make a decision on the TableVectorizer parameters before the next release
jupyterlite seems to be working for me now on the stable and dev branch. I have a question for @glemaitre : we have in the doc configuration the URL of...
> I checked, and it does work now. cool :) I think that if @lesteve 's suspicion above was right, it probably started working back in December when we did...
I wonder if instead of creating separate tests to compare polars to pandas, we should parametrize the existing tests to run them once on pandas dataframes and once on polars...
as is done in [this test](https://github.com/skrub-data/skrub/blob/f332ca698adb5ea5312c5b26b0b19c14b39e6eed/skrub/tests/test_agg_joiner.py#L30) for the agg joiner for example
In the utilities we added to `skrub._dataframe`, I believe `make_dataframe`, `make_series`, `join`, could use the dataframe api instead
and it seems our oldest supported pandas version does not support the dataframe api?
I bumped the pandas version just to see if the CI runs but having the dataframe api requires pandas 2.1.0 [release notes](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v2.1.0.html#other-enhancements) which dates from august 2023