cobra icon indicating copy to clipboard operation
cobra copied to clipboard

Raise warning when categorical variable left unprocessed

Open sborms opened this issue 3 years ago • 2 comments

Add a clear warning if categorical variable is returned untouched after preprocessing (no significant categories to keep).

sborms avatar Feb 25 '22 09:02 sborms

Hey Sam, I found part of the explanations sent back and forth over e-mail very enlightening and saw that they were not in the code, I've quickly added them. Created branch: https://github.com/PythonPredictions/cobra/tree/123-raise-warning-when-categorical-variable-left-unprocessed.

As for the warning asked in this issue itself, quickly skimming through the categorical processing learns it's not done in 5 minutes, I'd need to have a closer look to the diff between small_categories, combined_categories and unique_categories. I'll leave that open for now.

sandervh14 avatar Mar 03 '22 10:03 sandervh14

Potentially also interesting to discuss: Benoît suggested to increase the default value (5) of the category_size_threshold to avoid having too much categories left for modeling (so not merged in "other"). For most datasets, that seems like a reasonable change request to me.

sandervh14 avatar Mar 03 '22 10:03 sandervh14