cobra
cobra copied to clipboard
Raise warning when categorical variable left unprocessed
Add a clear warning if categorical variable is returned untouched after preprocessing (no significant categories to keep).
Hey Sam, I found part of the explanations sent back and forth over e-mail very enlightening and saw that they were not in the code, I've quickly added them. Created branch: https://github.com/PythonPredictions/cobra/tree/123-raise-warning-when-categorical-variable-left-unprocessed.
As for the warning asked in this issue itself, quickly skimming through the categorical processing learns it's not done in 5 minutes, I'd need to have a closer look to the diff between small_categories, combined_categories and unique_categories. I'll leave that open for now.
Potentially also interesting to discuss: Benoît suggested to increase the default value (5) of the category_size_threshold to avoid having too much categories left for modeling (so not merged in "other"). For most datasets, that seems like a reasonable change request to me.