Riccardo Cappuzzo

Results 71 issues of Riccardo Cappuzzo

As reported in #1424, a lot of docstrings have extremely long preambles that make it hard to understand what the function/object is doing. This PR addresses that issue by shortening...

no changelog needed

At the moment, - MinHashEncoder - GapEncoder - StringEncoder - TextEncoder - DatetimeEncoder all have the note on single column transformers at the very top of the docstring. ![Image](https://github.com/user-attachments/assets/16728c9c-04b2-401e-a27a-48cd6e31ec92) I...

documentation
no changelog needed

Currently, `skrub.selectors.Filter`, `skrub.selectors.NameFilter`,`skrub.selectors.Selector` are public and in the documentation are shown with an empty docstring: ![Image](https://github.com/user-attachments/assets/6d38a1da-e06f-4532-9ec4-d1a44fb263e9) This was likely unintended, so they should be hidden from the docs.

### Describe the issue linked to the documentation I am working on an example with some of the datasets provided in skrub.datasets, and for each of them I need to...

documentation
no changelog needed

First version, needs editing cleanup and double checking the section on lagged features.

documentation

At the moment, each dataset has its own version of the documentation. Some have info about the dataset, some have a description of the Bunch object. In general, it's messy...

Right now, the `ToDatetime` transformer tries to convert strings to datetimes by either guessing the format using pandas' timeseries parsing library, or it uses a format provided by the user....

enhancement

Tests in `test_table_vectorizer.py` are all using example pandas only dataframes, rather than using `df_module` and testing pandas, pandas nullable types, and polars We should update all the tests to address...

I think the skrub.datasets utilities for loading and returning datasets could be improved a bit for for a better user experience: - skrub supports both pandas and polars, but datasets...