Results 396 comments of Jérôme Dockès

> - i. I'm +1 for sampling random rows and computing a sufficient statistic to get the reference distance. This would represent the "global distance" between the left row and...

I think this has been addressed in #821

let's make a decision on the TableVectorizer parameters before the next release

jupyterlite seems to be working for me now on the stable and dev branch. I have a question for @glemaitre : we have in the doc configuration the URL of...

> I checked, and it does work now. cool :) I think that if @lesteve 's suspicion above was right, it probably started working back in December when we did...

I wonder if instead of creating separate tests to compare polars to pandas, we should parametrize the existing tests to run them once on pandas dataframes and once on polars...

as is done in [this test](https://github.com/skrub-data/skrub/blob/f332ca698adb5ea5312c5b26b0b19c14b39e6eed/skrub/tests/test_agg_joiner.py#L30) for the agg joiner for example

In the utilities we added to `skrub._dataframe`, I believe `make_dataframe`, `make_series`, `join`, could use the dataframe api instead

and it seems our oldest supported pandas version does not support the dataframe api?

I bumped the pandas version just to see if the CI runs but having the dataframe api requires pandas 2.1.0 [release notes](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v2.1.0.html#other-enhancements) which dates from august 2023