yellowbrick
yellowbrick copied to clipboard
Dataset loader requires y variable, unsuited for unsupervised datasets
The Dataset.to_numpy and Dataset.to_pandas methods both return X and y data for use in machine learning. Currently, all of our datasets are supervised (e.g. they have y data; however if we would like to include unsupervised data then these methods will have to be updated to return None for y. This will not work out of the box for the following reasons:
- the
to_numpymethod expectsyto exist in the .npz file; we'll either have to storeNonein the .npz file or perform a check in the meta.json to see if a target exists. - the
to_pandasmethod uses the meta.json and will need to gracefully handle the case wheretarget= nil.
Note that for dataset validation, "target" in Dataset.meta must be True, so the meta.json must have "target": nil rather than omit the "target" key.
Related to #535 (this bug was introduced with our knowledge in this PR)