Results 396 comments of Jérôme Dockès

after discussion with @ymzayek in the drop-in hours we decided to remove .tar.gz from the .gitignore

it seems the top-level keys are class names in the classification case, with the grids of parameters selected for each class/task as the values, and in the regression case where...

> a parameter that lets the user choose whether to turn on the automatic casting IIRC that is already what the `numeric_dtype` parameter does. despite its name in practice it...

I think the behavior around handling numbers and numeric strings could be sanitized a bit like this: - Columns that are already numeric are always left alone. They are numbers...

I agree, more easily inspecting what the tablevectorizer did would be very useful and something that participants in the skrub workshop last year have asked for. A lot of that...

I also quite like having everything in the top-level module. I think having some things in a separate module is nice when they are a bit less frequently used, _and_...

regarding tab completion, WDYT of defining a `skrub.__dir__()` in which we list the most frequently-used names? At the moment, there are already too many for tab-completion to be very useful...

for me the difficulty is that "useful" is subjective and continuous so it might be hard for users to guess (and for us to decide) what goes in skrub and...

> For some reason, Drop (not DropCols) is a public class, when it clearly shouldn't be. This should be fixed. they are both public, and both are useful. Drop is...

For names that are documented but are never used directly by the user like DataOp, I still think it is reasonable to exclude them from `__dir__()`. According to the documentation...