EthicML icon indicating copy to clipboard operation
EthicML copied to clipboard

More options with dataset creation

Open olliethomas opened this issue 5 years ago • 0 comments

This paper https://arxiv.org/pdf/1905.12728.pdf landed on arxiv this week. Their main point is that unfair behaviour can occur during data cleaning, in particular how we deal with missing values. They explicitly talk about how every fairness framework out there (they don't mention EthicML) doesn't take this into account.... but we kind of can.

This user story is to put a bit more oomph around it. Maybe we should have the strategies that they suggest as part of the dataset class, i.e. Adult(missing_data='drop_row'), Adult(missing_data='drop_column') or Adult(missing_data='something_else'), with drop_row being the default (as that's what happens already - I think)

olliethomas avatar May 05 '20 08:05 olliethomas