Feature Request: Class Weighting capabilities
I'm working on creating a Random Forest classification model for a dataset that has an unequal class balance. Other packages such as scikit-learn provide a "class weight" functionality docs which allows the minority class(es) to be weighted more heavily in the training of the individual trees. As far as I can tell, there isn't any functionality like that in any Julia decision tree implementation. Would this be possible to add?
Yes, this would be nice to have.
The sk-learn model does have class_weight and this is exposed in the MLJ wrapper. Unfortunately, passing a julia dict does not appear to work. Watch the linked issue for a possible workaround.
I haven't looked at the ScikitLearn.jl wrapper.
DecisionTree.jl has low maintenance from a few volunteers. If you'd like this feature added, your best chance is to make a PR, assuming you have the expertise. Be happy to review if someone else doesn't have the time.
It would be worth looking at the python code because it is based on C code which I think was ported to DecisionTree.jl, but I don't recall any accomodation for weights there. Or maybe the C code was just for individual trees. Sorry, I don't remember just now.
My suggestion would be to support per-observation weights first, and build class weight support on top of that (by using an analogue of the tool you linked to, which is something like this).