themis-ml
themis-ml copied to clipboard
Implement "Reweighting" fairness-aware preprocessing
Reweighting takes a dataset D and assigns a weight to each observation using conditional probabilities based on target labels and protected class membership.
s1 - disadvantaged group
s2 - advantaged group
+ - positive label
- - negative label
- large weights are assigned to
X_s1_y+andX_s0_y–:- weights for
s1 | +:(p(s1) * p(+)) / p(s1 and +) - weights for
s1 | -:(p(s1) * p(-)) / p(s1 and -)
- weights for
- small weights are assigned to
Xs1_y–andX_s0_y+- weights for
s0 | +:(p(s0) * p(+)) / p(s0 and +) - weights for
s0 | -:(p(s0) * p(-)) / p(s0 and -)
- weights for
- the weights are then used as input to model types that support weighted observations
NOTE: The above weighting scheme works because e.g. the numerator p(s1) * p(+) denotes the
expected probability of an observation being disadvantaged and positively labelled if the two variables are independent, and the denominator p(s1 and +) denotes the actual probability. Therefore, in a discriminatory dataset the term (p(s1) * p(+)) / p(s1 and +) will evaluate to > 1 since the actual probability of being s1 and + is less than the expected probability under the independence assumption.
Conversly, (p(s1) * p(-)) / p(s1 and -) will evaluate to < 1 since the actual probability of being s1 and - is greater than the expected probability under the independence assumption.