alibi
alibi copied to clipboard
Replace Boston dataset in examples
The Boston dataset which we use in some examples has an ethical problem and should be replaced. Read more here: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html#sklearn.datasets.load_boston
Impacted examples:
-
cfproto_housing.ipynb
-
ale_regression_boston.ipynb
The above link suggests some similar housing-related alternatives.
Hey, i would like to work on this issue. Could you assign this to me ? There are 2 dataset-alternatives ( California and Ames), is there any preference for either of them. Also would i need to the change the documentation in each of these files. Thanks :)
Thanks for your interest! You're right that this PR would need re-writing parts of the examples to talk about the new dataset.
The key thing for both examples is that they require a dataset with only numerical features (no categorical ones) which both datasets seem to satisfy. I would start with the California one as it's has fewer features.
Ok understood. Will get on this ASAP. Also out of curiosity, are you planning to accept contributions for hacktoberfest ?
I'm not familiar with how Hacktoberfest works, are there any particular requirements other than what we already do?
You just need to add the topic Hacktoberfest to the repository. You can refer the following links for more details:
- https://hacktoberfest.digitalocean.com
- https://hacktoberfest.digitalocean.com/resources/maintainers