Rik Huijzer
Rik Huijzer
There seems to be something wrong with the regression-case since the performance decreases when the number of `max_rules` increases. This should be the other way around. It could be that...
> I think some content of "Advanced example" could go to "Implementation overview". Also "Benchmarks" could perhaps go on its own section so that "advanced example" could focus on how...
``` pkg> activate --temp pkg> add MLJModels Resolving package versions... Updating `/private/var/folders/nf/xynnllys6nl13xhj67sqgt7c0000gn/T/jl_AG8cXc/Project.toml` [d491faf4] + MLJModels v0.16.12 [...] pkg> add SIRUS Resolving package versions... Updating `/private/var/folders/nf/xynnllys6nl13xhj67sqgt7c0000gn/T/jl_AG8cXc/Project.toml` [cdeec39e] + SIRUS v1.3.3 [...]...
Classes are determined for each fold separately. This could go wrong if some class is not contained in one of the folds. To fix this, add a check to test...
Passing numbers only is a bit of a hassle. It would be nice to be able to pass `String`s.
A table will be more clearly readable than the current output.
For regression, default to `n/3` and also allow overriding the setting. See https://datascience.stackexchange.com/a/23677 for details.
Non-greedy binary splitting shouldn’t be too expensive. Number of points to check is `q * sqrt(p)` versus `(q * sqrt(p))^2`. Maybe only enable for reasonably low `p` and `q`.
Say there is a rule `A < 10, then a else b` where `a ≅ b` then the rule can be removed.