modelStudio icon indicating copy to clipboard operation
modelStudio copied to clipboard

model_diagnostics fails due to no rows to aggregate

Open MJimitater opened this issue 2 years ago • 9 comments

Hi, Im trying to get modelStudio() (v3.1.0) working in RStudio. It did work once, but after some changes to the xgboost model it is difficult for me to see why its not working anymore. Creating DALEX::explain() works fine, also explain$predict_function(trained_model, test[1, , drop = FALSE]) returns a valid score. But running modelStudio(trained_model) lately always gets stuck at the same spot:

image

I tried finding out at what point aggregation fails, but its hard for me to really pin-point. Does anyone have an idea or ran into this issue before? Sorry that I cannot provide any reproducible example, please let me know if further code snippets are needed, Im happy to provide and grateful for any help!

MJimitater avatar Jul 21 '22 10:07 MJimitater

Hi @MJimitater, without code and/or any reproducible example, e.g. on a different data/model, I might not be able to help you.

hbaniecki avatar Jul 21 '22 13:07 hbaniecki

Hi @hbaniecki, this is the code that I used

explain <- DALEX::explain(trained_model, data = train[1:train_size,], y = as.numeric(train[1:train_size,]$PRUEFERGEBNIS)-1, type = "classification", label = "xgboost", predict_function = 
                              function(trained_model, obs){
                                previous_na_action <- options('na.action')
                                options(na.action='na.pass')
                                sparse_matrix_test <- sparse.model.matrix(PRUEFERGEBNIS ~., data = obs)
                                options(na.action=previous_na_action$na.action)
                                
                                results_test <- predict(trained_model, sparse_matrix_test, type = "response")
                                round(results_test,3)
                              }, predict_function_target_column = 0)
  
  new_observation <- test[ind_obs, , drop = FALSE]
  
  xai_dashboard <- modelStudio(explain, new_observation = new_observation)

I hope this helps somewhat. If this doesn't help much, then we can see how to provide the model and data. Thanks

MJimitater avatar Jul 21 '22 14:07 MJimitater

Hi @MJimitater, of what class is the object passed to data? Does the following example help you by any chance https://modelstudio.drwhy.ai/articles/ms-r-python-examples.html#xgboost-dashboard?

Googling the error message, I'm unsure if this issue is related to DALEX and modelStudio.

Might try to debug it if you provide data/code.

hbaniecki avatar Jul 24 '22 17:07 hbaniecki

Hi @hbaniecki , the data object passed to explain() is a "data.table" "data.frame" and train[1:train_size,] the first train_size -rows.

Yes, Im also unsure if this issue is related to modelStudio.. is there way to provide data/code confidentially?

MJimitater avatar Jul 26 '22 09:07 MJimitater

Can you serialize both in R, so I can load them into my environment and share with me at [email protected]? Data can be small (a few rows). You can also create a simpler xgboost model where the error still occurs--perhaps a small subset of features suffices.

Anyhow, I don't have your email to reach out.

hbaniecki avatar Jul 26 '22 09:07 hbaniecki

@hbaniecki Thanks so much for your help! Just now, I sent you an email with the model and data in .rds format. I hope you can reproduce, keep me posted ;) Cheers

MJimitater avatar Jul 26 '22 13:07 MJimitater

Hi @hbaniecki sorry for my late reply, Thanks again for your excellent help in debugging! I have thoroughly examined the data for NA-values and have removed or imputed them in the numeric features, and changed them to "(Missing)" in the factorial features (by fct_explicit_na()) hoping that the problem with the variable_splits is solved. I do hope that this is a valid approach to generating error-free model- and predict profile explanations. However , running modelStudio(explain, new_observation, B=3) with the updates runs in the following error:

 Calculating ingredients::ceteris_paribus (1)         
  Calculating ...                                                                                             
    Elapsed time: 00:10:37 ETA:10s Warning messages:
1: In value[[3L]](cond) : 
Error occurred in ingredients::describe.ceteris_paribus function: missing value where TRUE/FALSE needed
2: In value[[3L]](cond) : 
Error occurred in ingredients::describe.ceteris_paribus function: missing value where TRUE/FALSE needed

Now Im trying to find where the bug occurs, but seemingly everything works so far:

#doesn't work:
ms <- modelStudio(explain, new_observation, B=3)

#works:
mp <- model_performance(explain)

#works:
mr <- model_diagnostics(explain)

#works:
fi <- model_parts(explain, B=2)
plot(fi)

#works:
pdp <- model_profile(explain)

#works:
pd <- ingredients::partial_dependence(explain) 

#works:
pp <- predict_profile(explain, new_observation)

#both work:
cp <- ingredients::ceteris_paribus(explain, new_observation)
pdp <- ingredients::aggregate_profiles(cp, type = "partial")

Now Im wondering that cp <- ingredients::ceteris_paribus(explain, new_observation) worked, but somehow still throws this error in modelStudio. What else can I try? What other functions can I try to unit test in order to debug? Do you think this is still a problem with the "(Missing)"-values in the categories? Thank you so much

MJimitater avatar Aug 12 '22 13:08 MJimitater

Hi, I presume your modelstudio is working just fine and you are only worried about the warning?

As the message suggests, something went wrong in the ingredients::describe(<ceteris paribus object>) function call, which produces textual description of the ceteris paribus explanation. This is an experimental feature to display text when you hover on the D button next to your plot.

For me to debug it further, it would be best to obtain the updated model/data.

hbaniecki avatar Aug 12 '22 14:08 hbaniecki

Thanks for your fast reply, I'll get back to you next week! ;)

MJimitater avatar Aug 12 '22 14:08 MJimitater

Actually you are right, modelStudio works just fine! Finally the explanation of single test cases works without errors!

BTW, would be nice to have a faster way of explaining and visualizing individual test cases, than waiting a lengthy time for modelStudio to finish calculating, but I guess this the way it is ;) Great software!

Thanks again for the excellent support, closing for now

MJimitater avatar Aug 15 '22 14:08 MJimitater