scikit-learn-mooc icon indicating copy to clipboard operation
scikit-learn-mooc copied to clipboard

Simplify code using skrub TableReport and TableVectorizer

Open ArturoAmorQ opened this issue 5 months ago • 0 comments

  • Add a notebook + video to show how all the pandas code in the Visual inspection of data subsection can be simplified using skrub.TableReport:
  • Replace ColumnTransformer with skrub.TableVectorizer starting from the Using numerical and categorical variables together notebook
    • In the same notebook, section Fitting a more powerful model, replace OrdinalEncoder by skrub.ToCategorical.
    • Explicitly mention that TableVectorizer makes the column selection automatically by using its dtype
    • Introduce concept of "low/high cardinality" and demonstrate effect of cardinality_threshold on the "native-country" column in the Adult Census dataset.
    • Update visualizing scikit-learn pipelines video to use TableVectorizer (with scikit-learn version >= 1.8)
    • Modify wrap-up quizzes that use the Ames Housing dataset i.e. M1, M4 and M5 to select subset of numerical columns with pandas
  • Redo the datasets description using TableReport

ArturoAmorQ avatar Oct 29 '25 14:10 ArturoAmorQ