rpart icon indicating copy to clipboard operation
rpart copied to clipboard

Use Case Weights To Threshold Splits

Open mhermher opened this issue 4 years ago • 0 comments

If the weights passed into the model are case weights, then should they not be used to determine whether a split should happen or not?

In partition.c me->num_obs is being compared to rp.min_split instead of me->sum_wt.

similarly, in anova.c (haven't looked at the others), right_n and left_n are being compared to edge (rp.min_node) instead of right_wt and left_wt.

Using case weights to represent number of cases is really helpful in managing runtime and memory efficiency, but the split logic in the C code is not considering them.

Even writing as custom split function would solve the latter case, but not the former.

mhermher avatar Aug 12 '20 18:08 mhermher