mlr3filters
mlr3filters copied to clipboard
add Gaussian Covariance filter
Adding Gaussian Covariance 'filter' from the package https://cran.r-project.org/web/packages/gausscov/gausscov.pdf.
I am getting error when running test and examples but can't figure out why.
Error in .__Param__assert(self = self, private = private, super = super, :
Assertion on 'x' failed: Element 1 is not >= 1.
It seems like my filter is not in mlr_filter list.
The function works as expected when tried tu instantiate class and run calculate and score.
I am not sure if classif models are supported, so I added regr only. I can try to contact the author.
Missing values are not allowed.
I have just checked examples in gausscov package and it has example with binary covariate. So, it works for classification too. But the target variable has to be a matrix, not factor. I can add classif example after you review initial PR.
Sorry for not responding here (I did not see it).
Are you still interested in contributing this filter?
Yes. I will send last version of the pipe. I think I have changed something till PR.
Is there anything I should add to current commit?
When I run the test from the the pull request, I get a lot of NA values, can you explain why this happens?
I can't make new PR for some reason, but can you try this code:
FilterGausscovF1st = R6::R6Class(
"FilterGausscovF1st",
inherit = mlr3filters::Filter,
public = list(
#' @description Create a GaussCov object.
initialize = function() {
param_set = ps(
p0 = p_dbl(lower = 0, upper = 1, default = 0.01),
kmn = p_int(lower = 0, default = 0),
kmx = p_int(lower = 0, default = 0),
mx = p_int(lower = 1, default = 21),
kex = p_int(lower = 0, default = 0),
sub = p_lgl(default = TRUE),
inr = p_lgl(default = TRUE),
xinr = p_lgl(default = FALSE),
qq = p_int(lower = 0, default = 0)
)
super$initialize(
id = "gausscov_f1st",
task_types = c("classif", "regr"),
param_set = param_set,
feature_types = c("integer", "numeric"),
packages = "gausscov",
label = "Gauss Covariance f1st",
man = "mlr3filters::mlr_filters_gausscov_f1st"
)
}
),
private = list(
.calculate = function(task, nfeat) {
# debug
# pv = list(
# p0 = 0.01,
# kmn = 0,
# kmx = 0,
# mx = 21,
# kex = 0,
# sub = TRUE,
# inr = TRUE,
# xinr = FALSE,
# qq = 0
# )
# empty vector with variable names as vector names
scores = rep(-1, length(task$feature_names))
scores = mlr3misc::set_names(scores, task$feature_names)
# calculate gausscov pvalues
pv = self$param_set$values
x = as.matrix(task$data(cols = task$feature_names))
if (task$task_type == "classif") {
y = as.matrix(as.integer(task$truth()))
} else {
y = as.matrix(task$truth())
}
res = mlr3misc::invoke(gausscov::f1st, y = y, x = x, .args = pv)
res_1 = res[[1]]
res_1 = res_1[res_1[, 1] != 0, , drop = FALSE]
scores[res_1[, 1]] = abs(res_1[, 4])
# save scores
dir_name = "./gausscov_f1"
if (!dir.exists(dir_name)) {
dir.create(dir_name)
}
random_id <- paste0(sample(0:9, 15, replace = TRUE), collapse = "")
file_name = paste0("gausscov_f1-", task$id, "-", random_id, ".rds")
file_name = file.path(dir_name, file_name)
saveRDS(scores, file_name)
sort(scores, decreasing = TRUE)
}
)
)
You can't make a new PR from your main branch because you already have a PR open. You could e.g. make a new branch in your fork and then create a new PR.