explainerdashboard icon indicating copy to clipboard operation
explainerdashboard copied to clipboard

Categorical columns

Open AlexanderZender opened this issue 2 years ago • 3 comments

I encountered an issue that was documented before with LightGBM models. In my case it is not related to LightGBM, as it occurs before data arrives at the model. This error occurs here in explainer dashboard: image

It triggers when sorting a string column, and the column contains nan values.

Additionally, some categorical columns get deleted during the explainer process. Here self.X still has all columns: image

But two lines further down, they are gone. I can't really tell what's happening, as nothing should be happening: image

At this point these column seem to not be needed. The explainer dashboard process works fine until the liftcurve_dfs is computed. It will crash as the sample df does not contain the necessary columns for the model:

image

Update: In the last picture I also realized that the feature "Sex" got changed from either male or female to 1 or 0. While the other categories still present like "Embarked" was not changed. This will break the model too, as the preprocessing will be handled by it.

AlexanderZender avatar Jul 13 '23 14:07 AlexanderZender

Fix for the first issue is to ignore nan values:

image

AlexanderZender avatar Jul 13 '23 14:07 AlexanderZender

Hi @AlexanderZender ,nice catch! I'm on holiday until the end of the month so away from keyboard, but if you want to try to write up a PR I can see if I can have a look at it and get it released once I'm back

oegedijk avatar Jul 13 '23 20:07 oegedijk

I found the issue(?) with my changed columns and values. The model applied its preprocessing and these changes got applied to self.X in the explainer class. I currently solved it by making a copy of the passed x in my Wrapper class: image

The question is, should the user ensure this, or should explainer dashboard maybe only pass copies down to the model in the first place? I can see that sometimes copies are indeed passed by explainer dashboard e.g. image

I would suggest to only pass copies, to avoid potential issues with other models or pipeline in which the user might not have influence or doesnt want to add a wrapper.

@oegedijk If you dont mind i will open a PR in a bit with both things

AlexanderZender avatar Jul 14 '23 10:07 AlexanderZender