OneR
OneR copied to clipboard
multivariate
1) Suggest extending OneR to the multivariate situation where the dependent variable is a vector for each case. For example, using the built-in anscombe
data frame and manova
in base R we can find the variables out of x1, x2, x3 and x4 which best predict all the y1, y2, y3, y4 target variables (as opposed to performing 4 different runs and possibly getting different best variable for each). In the example below we find that x1
is the best variable to use if we can only use one variable for predicting all 4 y variables. It might have been that if we ran 4 different lm's that different variables would be best for different target variables but using manova
we discover which are the overall best.
fo <- cbind(y1, y2, y3, y4) ~ x1 + x2 + x3 + x4
summary(manova(fo, anscombe))
## Df Pillai approx F num Df den Df Pr(>F)
## x1 1 0.93473 17.9026 4 5 0.003631 **
## x4 1 0.76783 4.1341 4 5 0.075826 .
## Residuals 8
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If OneR supported this one could run where mOneR
is multivariate OneR
.
mOneR(fo, anscombe)
2) Also perhaps one could specify if only one variable could result for all target variables as above or if it would be run separately for each target variable. The latter case would correspond to using lm
instead of manova
summary(lm(fo, anscombe))
This is the same as running 4 separate lm instances but can be expressed more compactly in one line.
In this case one would run
OneR(fo, anscombe)
and it would just be a more compact way of running against each target variable separately:
f <- function(y) OneR(reformulate(c("x1", "x2", "x3", "x4"), y), anscombe)
Map(f, c("y1", "y2", "y3", "y4"))
Thank you for your suggestions, I will have a look into it!