SIRUS.jl icon indicating copy to clipboard operation
SIRUS.jl copied to clipboard

Regression likely contains a bug

Open rikhuijzer opened this issue 1 year ago • 5 comments

There seems to be something wrong with the regression-case since the performance decreases when the number of max_rules increases. This should be the other way around. It could be that this is related to the weight calculation because the weight calculation has not much effect on the classification performance.

Maybe the weights should take into account the rule frequencies? Currently, this is sort of used in the weight calculation via a rule space where each rule is a binary feature indicating whether the datapoint satisfies the rule. However, if I understand correctly, this is a shortcut compared to using the rules and rule frequencies as established during the tree fitting procedure.

rikhuijzer avatar Jun 28 '23 10:06 rikhuijzer