Riccardo Cappuzzo issues

Results 71 issues of


                                            Riccardo Cappuzzo

Shortening docstring preamble and moving text to the notes

As reported in #1424, a lot of docstrings have extremely long preambles that make it hard to understand what the function/object is doing. This PR addresses that issue by shortening...

no changelog needed

Expand section on datetime encoding in user guide

Fixes #858

no changelog needed

Move the note on single-column transformers to the Notes section of the docstring

At the moment, - MinHashEncoder - GapEncoder - StringEncoder - TextEncoder - DatetimeEncoder all have the note on single column transformers at the very top of the docstring. ![Image](https://github.com/user-attachments/assets/16728c9c-04b2-401e-a27a-48cd6e31ec92) I...

documentation

no changelog needed

Modify `Filter`, `NameFilter`, `Selector` so that they are private

Currently, `skrub.selectors.Filter`, `skrub.selectors.NameFilter`,`skrub.selectors.Selector` are public and in the documentation are shown with an empty docstring: ![Image](https://github.com/user-attachments/assets/6d38a1da-e06f-4532-9ec4-d1a44fb263e9) This was likely unintended, so they should be hidden from the docs.

Add some stats on the size of the datasets provided in skrub.datasets

### Describe the issue linked to the documentation I am working on an example with some of the datasets provided in skrub.datasets, and for each of them I need to...

documentation

no changelog needed

Adding an example of how to perform timeseries forecasting with lagged features using expressions

First version, needs editing cleanup and double checking the section on lagged features.

documentation

Improve the documentation of the skrub.datasets fetch functions

At the moment, each dataset has its own version of the documentation. Some have info about the dataset, some have a description of the Bunch object. In general, it's messy...

Extend the `ToDatetime` transformer so that it can take a list of datetime formats

Right now, the `ToDatetime` transformer tries to convert strings to datetimes by either guessing the format using pandas' timeseries parsing library, or it uses a format provided by the user....

enhancement

TableVectorizer tests only check pandas dataframes

Tests in `test_table_vectorizer.py` are all using example pandas only dataframes, rather than using `df_module` and testing pandas, pandas nullable types, and polars We should update all the tests to address...

Various features and improvements for the skrub.datasets utilities

I think the skrub.datasets utilities for loading and returning datasets could be improved a bit for for a better user experience: - skrub supports both pandas and polars, but datasets...