fairml icon indicating copy to clipboard operation
fairml copied to clipboard

`direct_input_pertubation_strategy=` isn't passed down

Open afiodorov opened this issue 7 years ago • 1 comments

Nice method.

I am examining the code and the thesis more closely as it appears to be very useful.

I don't fully understand the point of perturbation strategy and it's not fully expanded on in the thesis.

I started reading the code and I spotted some bugs.

Firstly

https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L120

takes the strategy but ignores it, see:

https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L217

Also, I think that with constant_zero and median perturbation strategies this loop is redundant:

https://github.com/adebayoj/fairml/blob/master/fairml/orthogonal_projection.py#L205

As each run ignores random_sample_selected anyway, so each run should produce the same output_difference_col and total_difference. (because data_col_ptb and total_ptb_data are identical each run).

Finally, it would be great if you could explain more in the documentation the purpose of direct_input_pertubation_strategy. Is it necessary at all to "zero-out" a column? Why?

It appears to me that just by orthogonalising other columns you already take away the effect of the subject column. Not clear to me why zero'ing out is required on top. Is it to be certain the effect of the column is not present?

Many thanks for the code by the way!

afiodorov avatar Mar 12 '17 14:03 afiodorov

@afiodorov thanks for taking a look at the code and for your feedback.

  1. you are right. with the constant-zero and median strategies, the loop is redundant. I plan on separating the direct perturbation code from overall method to make it so that the run is independent. I am testing a local branch atm that handles this. Will push a fix up later this wk.

  2. The purpose of direction perturbation. This is also a good question, and you are right, it is not explained in the thesis. We are working on suitable documentation to fully explain the overall issue.

For now here is the justification for including direction perturbation: if you have a function f(x_1, x_2, x_2). What, fairml does on a high level is to give you the dependence of f on each of the x_i. Now the dependence is calculated as direct influence + indirect influence. For direct influence, we generate a data transformation using any of the different direct perturbation strategies and then look at the impact of the black-box function on that perturbation. For the indirect influence, we use orthogonal transformation to generate those transformations.

Certainly, we could just use orthogonal transformation on all variables including, but wanted to give people flexibility to pick whatever function that they are interested in using for this task. Hope this helps explain the use of the direct-perturbation strategy requirement.

adebayoj avatar Mar 13 '17 20:03 adebayoj