AIF360 icon indicating copy to clipboard operation
AIF360 copied to clipboard

Add support for KDD Census-Income dataset

Open anupamamurthi opened this issue 2 years ago • 4 comments

Original Dataset location: https://archive.ics.uci.edu/ml/datasets/Census-Income%2B(KDD) This is similar to the adult income dataset.

Potential Tasks:

  • [ ] Ensure the license permits open source us
  • [ ] Verify that this dataset is appropriate for fairness tasks and subset it accordingly (removing un-necessary columns etc.)
  • [ ] Ensure we have instance level records with protected attributes and outcomes
  • [ ] First create sklearn-compatible dataset (dataframe) and an appropriate "classic" dataset (second priority)
  • [ ] Create a simple notebook where the dataset is consumed and simple fairness measures and computed at least.
  • [ ] https://microdata.worldbank.org/index.php/catalog/2102/data-dictionary/F2?file_name=NLD2001-P-H

anupamamurthi avatar Aug 27 '22 03:08 anupamamurthi

hello, I'm interested in the issue. Could you please assign it to me?

Jyc323 avatar Sep 22 '23 19:09 Jyc323

hello @anupamamurthi, the original link to dataset is invalid now, can I use this link?https://archive.ics.uci.edu/dataset/117/census+income+kdd

Jyc323 avatar Sep 22 '23 19:09 Jyc323

Sure, go for it @Jyc323

anupamamurthi avatar Sep 22 '23 19:09 anupamamurthi

Hello @anupamamurthi, I'd like to confirm the progress. I've created the dataset file "income_dataset.py," similar to "adult_dataset.py," and customized it to match the dataset attributes. As per the instructions, I need to develop a basic .ipynb file. Could you advise on the appropriate folder to place it in? In this .ipynb file, I plan to utilize "BinaryLabelDatasetMetric" to process the dataset. Is this approach sufficient? Thank you!

Jyc323 avatar Sep 23 '23 00:09 Jyc323