yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

Dataset loader requires y variable, unsuited for unsupervised datasets

Open bbengfort opened this issue 6 years ago • 1 comments

The Dataset.to_numpy and Dataset.to_pandas methods both return X and y data for use in machine learning. Currently, all of our datasets are supervised (e.g. they have y data; however if we would like to include unsupervised data then these methods will have to be updated to return None for y. This will not work out of the box for the following reasons:

  • the to_numpy method expects y to exist in the .npz file; we'll either have to store None in the .npz file or perform a check in the meta.json to see if a target exists.
  • the to_pandas method uses the meta.json and will need to gracefully handle the case where target= nil.

Note that for dataset validation, "target" in Dataset.meta must be True, so the meta.json must have "target": nil rather than omit the "target" key.

bbengfort avatar Jan 02 '19 02:01 bbengfort

Related to #535 (this bug was introduced with our knowledge in this PR)

bbengfort avatar Jan 02 '19 02:01 bbengfort