Better _repr_ for the Bunch object used to hold the datasets
Problem Description
Currently, when the Bunch object is displayed in a notebook cell it uses the standard dict repr, which makes it hard to actually see the content, including the name of the keys (sometimes it's X, y and data, in other cases they have names relative to the dataset).
Feature Description
It would be nice to have a clearer repr.
Alternative Solutions
No response
Additional Context
No response
Hello @rcap107!
Shouldn't this be something defined in scikit-learn class itself?
Hello @rcap107! Shouldn't this be something defined in scikit-learn class itself?
Hey @MarieSacksick!
In skrub, the Bunch class is only used for fetching the datasets, so I think it would be simpler to either extend it here than updating the scikit-learn class
On this topic, I feel that exposing filenames and changing some examples to load from filename is a higher priority than changing the repr
On this topic, I feel that exposing filenames and changing some examples to load from filename is a higher priority than changing the repr
agreed, this issue is lower priority and mostly here to keep track of it
On this topic, I feel that exposing filenames and changing some examples to load from filename is a higher priority than changing the repr
You mean using the expressions or directly via {pd, pl}.read_{csv, parquet}?
You mean using the expressions or directly via {pd, pl}.read_{csv, parquet}?
Directly via the readers.
But it opens the door to having other examples that demonstrate I/O patterns with expressions