themis-ml icon indicating copy to clipboard operation
themis-ml copied to clipboard

Implement "Reweighting" fairness-aware preprocessing

Open cosmicBboy opened this issue 8 years ago • 0 comments

Reweighting takes a dataset D and assigns a weight to each observation using conditional probabilities based on target labels and protected class membership.

s1 - disadvantaged group s2 - advantaged group + - positive label - - negative label

  • large weights are assigned to X_s1_y+ and X_s0_y–:
    • weights for s1 | +: (p(s1) * p(+)) / p(s1 and +)
    • weights for s1 | -: (p(s1) * p(-)) / p(s1 and -)
  • small weights are assigned to Xs1_y– and X_s0_y+
    • weights for s0 | +: (p(s0) * p(+)) / p(s0 and +)
    • weights for s0 | -: (p(s0) * p(-)) / p(s0 and -)
  • the weights are then used as input to model types that support weighted observations

NOTE: The above weighting scheme works because e.g. the numerator p(s1) * p(+) denotes the expected probability of an observation being disadvantaged and positively labelled if the two variables are independent, and the denominator p(s1 and +) denotes the actual probability. Therefore, in a discriminatory dataset the term (p(s1) * p(+)) / p(s1 and +) will evaluate to > 1 since the actual probability of being s1 and + is less than the expected probability under the independence assumption.

Conversly, (p(s1) * p(-)) / p(s1 and -) will evaluate to < 1 since the actual probability of being s1 and - is greater than the expected probability under the independence assumption.

cosmicBboy avatar Aug 30 '17 04:08 cosmicBboy