DALEX icon indicating copy to clipboard operation
DALEX copied to clipboard

Bayesian regularization support in DALEX

Open asheetal opened this issue 2 years ago • 3 comments

In some social science fields large data do not exist and researchers must make decisions using small number of samples (p >> n problem) Good to see support in R (tfprobability, brnn packages) Wondering if the DALEX team has any thoughts/comments on this?

asheetal avatar Aug 09 '21 01:08 asheetal

@asheetal size of the data shall not matter in the implemented XAI techniques (nor local nor global), but let's try, do you have any trained models for tests?

pbiecek avatar Jan 15 '22 23:01 pbiecek

In a recent experiment with p >> n what I did was as follows

create an p x l array p = predictor, l = 1000 below
for (i in 1:1000) {
     randomize the seed
     build a keras model
     generate variable importance rank with DALEX
     against each predictor append the rank number from DALEX into its list
}
sort the predictor array based on how many times that predictor has received ones, followed by twos etc etc

It indeed helped. The final rank was a histogram against each predictor. I found that if I had run it once (l=1) I would have gotten completely inaccurate results.

asheetal avatar Jan 16 '22 12:01 asheetal

Forgot to add. The problem is not within DALEX. The problem is the model itself. For p >> n, the model must be Bayesian probabilitic. So must work in conjunction tfprobability etc models, so that now the variable importance is not a rank rather a probabilistic range of ranks. The researcher can now choose to decide how to infer the rank - median, max, min, overlapping.

asheetal avatar Jan 17 '22 05:01 asheetal