EthicML
EthicML copied to clipboard
More options with dataset creation
This paper https://arxiv.org/pdf/1905.12728.pdf landed on arxiv this week. Their main point is that unfair behaviour can occur during data cleaning, in particular how we deal with missing values. They explicitly talk about how every fairness framework out there (they don't mention EthicML) doesn't take this into account.... but we kind of can.
This user story is to put a bit more oomph around it. Maybe we should have the strategies that they suggest as part of the dataset class, i.e.
Adult(missing_data='drop_row'), Adult(missing_data='drop_column') or Adult(missing_data='something_else'), with drop_row being the default (as that's what happens already - I think)