alibi icon indicating copy to clipboard operation
alibi copied to clipboard

Replace Boston dataset in examples

Open jklaise opened this issue 3 years ago • 5 comments

The Boston dataset which we use in some examples has an ethical problem and should be replaced. Read more here: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html#sklearn.datasets.load_boston

Impacted examples:

  • cfproto_housing.ipynb
  • ale_regression_boston.ipynb

The above link suggests some similar housing-related alternatives.

jklaise avatar Sep 28 '21 12:09 jklaise

Hey, i would like to work on this issue. Could you assign this to me ? There are 2 dataset-alternatives ( California and Ames), is there any preference for either of them. Also would i need to the change the documentation in each of these files. Thanks :)

Pranjalmishra30 avatar Sep 28 '21 13:09 Pranjalmishra30

Thanks for your interest! You're right that this PR would need re-writing parts of the examples to talk about the new dataset.

The key thing for both examples is that they require a dataset with only numerical features (no categorical ones) which both datasets seem to satisfy. I would start with the California one as it's has fewer features.

jklaise avatar Sep 28 '21 13:09 jklaise

Ok understood. Will get on this ASAP. Also out of curiosity, are you planning to accept contributions for hacktoberfest ?

Pranjalmishra30 avatar Sep 28 '21 13:09 Pranjalmishra30

I'm not familiar with how Hacktoberfest works, are there any particular requirements other than what we already do?

jklaise avatar Sep 28 '21 13:09 jklaise

You just need to add the topic Hacktoberfest to the repository. You can refer the following links for more details:

  • https://hacktoberfest.digitalocean.com
  • https://hacktoberfest.digitalocean.com/resources/maintainers

Pranjalmishra30 avatar Sep 28 '21 14:09 Pranjalmishra30