Jérôme Dockès comments

Results 396 comments of


                                            Jérôme Dockès

[WIP] Add `skrub.Report`

I think we can merge this PR. If at some point we are happy enough with the online visualizer to add it to the docs we can do another PR....

[WIP] Add `skrub.Report`

> Failing test (a quick look suggests that it is related to the PR). yes I'm on it :) I made a small change on how labels are rotated to...

[WIP] Add `skrub.Report`

> Awesome. I definitely see myself using this functionality any suggestion on a short description of those columns for the drop-down? "Columns with high similarity"?

Shorthand for getting only the preprocessing part of the TableVectorizer

some examples of the kind of cleaning the tablevectorizer does: ```python >>> import pandas as pd >>> from skrub import TableVectorizer >>> skrubber = TableVectorizer( ... high_cardinality_transformer="passthrough", ... low_cardinality_transformer="passthrough", ......

Improve rendering of matplotlib image in dark mode

sounds good, why do we need the height: unset?

Limit Cramer's V analysis to target column

as skrub has a focus on supervised learning having an optional 'target' parameter for the tablereport that causes it to show slightly different information might be a good idea, and...

Extend the `ToDatetime` transformer so that it can take a list of datetime formats

The case where I have a list of formats (not just one) but the default list used by pandas is not adequate sounds a bit niche to warrant the added...

Extend the `ToDatetime` transformer so that it can take a list of datetime formats

> The reason I thought of this was to address the case in which datetimes are using locale specific formats, e.g. French day/month names, which I don't think are parsed...

Implemented adaptive squashing

> @GaelVaroquaux was suggesting to implement SquashingScaler directly instead of with the indirection through SingleColumnSquashingScaler. I'll need to check how easy that is... that's an option too, I'll let you...

Implemented adaptive squashing

> Your point Jerome is about memory overhead? yes because we go from columnar format, to one contiguous array, then back to columnar (dataframe format). that is the case even...