confound_prediction Generalization to more than 1 confounding factor

Generalization to more than 1 confounding factor

Open Rachine opened this issue 4 years ago • 1 comments

Hello, Thank you very much for tackling this issue of confounders, which seems very recurrent in clinical ML problems.

I have some questions about the project/paper:

I am wondering why only the test set needs to be Deconfounded? Why not build also a train set which is Deconfounded and a Deconfounded test set (with no data leakage of course)?
I tried to make a generalization of your methodology with k multiple confounders I still used most of your codebase and I used a pseudo generalization of the mutual information of multiple variables. The probability to be sampled m_i which was

is now:

The quantity can still be estimated with kernel density estimation.

I made some quick toy examples, it seems to approximately work on simple additive toy examples and when the number of example is sufficient: For instance with 1000 sample and 10 confounding factors i got: For instance with 100 sample and 3 confounding factors i got:

It would be also interesting to study the required N to be sure at a certain level the deconfounding capability for k factors considering the type of link.

Do you think this is a correct approach and generalization?

Thank you

Best regards

Jul 09 '20 14:07 Rachine

Oops, after some thinking maybe I should look at the goodness of fit with the multiple variable and not only individual correlations, to test

I added the R^2 when I do a Ordinary Least Squares with stats model 'y ~ z0 + z1 + z2'

Jul 10 '20 08:07 Rachine

confound_prediction confound_prediction copied to clipboard

Generalization to more than 1 confounding factor

confound_prediction
confound_prediction copied to clipboard