skrub icon indicating copy to clipboard operation
skrub copied to clipboard

Move the note on single-column transformers to the Notes section of the docstring

Open rcap107 opened this issue 6 months ago • 6 comments

At the moment,

  • MinHashEncoder
  • GapEncoder
  • StringEncoder
  • TextEncoder
  • DatetimeEncoder

all have the note on single column transformers at the very top of the docstring. Image

I think it should be moved to the Notes section of the docstring, as it takes a lot of space and it does not bring any information after the first time the user has read it.

rcap107 avatar Jun 02 '25 14:06 rcap107

it does not bring any information after the first time the user has read it.

I think the point is to ensure new users reading the doc see this disclaimer, as it's a pretty important paradigm shift from scikit-learn. I'm hesitant on this issue, because we might have an expert bias.

Vincent-Maladiere avatar Jun 18 '25 15:06 Vincent-Maladiere

Discussing with @glemaitre, we could split the note so that at the top there is only a short mention like "such and such is a column transformer, so it works differently from typical scikit-learn transformers, click here for more detail" and have the rest of the paragraph in the note section of the docstring.

rcap107 avatar Jun 19 '25 15:06 rcap107

Discussing with @.***, we could split the note so that at the top there is only a short mention like "such and such is a column transformer, so it works differently from typical scikit-learn transformers, click here for more detail" and have the rest of the paragraph in the note section of the docstring.

OK but not as a "note".

Also, this is a bit long: it's going to show up on all the docstrings that appear when people open a parenthesis in an IDE, diminishing their usefulness.

GaelVaroquaux avatar Jun 19 '25 16:06 GaelVaroquaux

Discussing with @.***, we could split the note so that at the top there is only a short mention like "such and such is a column transformer, so it works differently from typical scikit-learn transformers, click here for more detail" and have the rest of the paragraph in the note section of the docstring. OK but not as a "note".

Also, this is a bit long: it's going to show up on all the docstrings that appear when people open a parenthesis in an IDE, diminishing their usefulness.

Not sure if I follow here. What I mean is that I want to replace the current paragraph with the sentence from above, and move the rest of the paragraph where the notes section of the docstring is. The current note does not appear in Code, so the new version of it shouldn't appear either

rcap107 avatar Jun 19 '25 17:06 rcap107

From IRL discussion:

Good to have, but should not be a blocker for 0.6.0. We can release a fix in 0.6.1

rcap107 avatar Jul 15 '25 09:07 rcap107

The text that is used in the note is defined in _apply_to_cols.py in _SINGLE_COL_LINE. This string should be modified and summarized so that the start of the docstring only contains the summary. The rest should be in the Notes part of the docstring and the summary should refer to that.

rcap107 avatar Oct 27 '25 14:10 rcap107