SIRUS.jl
SIRUS.jl copied to clipboard
Regression likely contains a bug
There seems to be something wrong with the regression-case since the performance decreases when the number of max_rules
increases. This should be the other way around. It could be that this is related to the weight calculation because the weight calculation has not much effect on the classification performance.
Maybe the weights should take into account the rule frequencies? Currently, this is sort of used in the weight calculation via a rule space where each rule is a binary feature indicating whether the datapoint satisfies the rule. However, if I understand correctly, this is a shortcut compared to using the rules and rule frequencies as established during the tree fitting procedure.