kknn
kknn copied to clipboard
Error When Character Level In Train But Not Test
Hello! I think there is a bug where if the testset does not contain levels of a character variable that are in the trainset, the model will not run.
library(kknn)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#Character level mis-alignment
dta <- data.frame(
state = as.character(sample(1:10, 100, replace = T)),
x = rnorm(100), stringsAsFactors = F)
dta$y <- rnorm(100)
test_dta <- data.frame(state = as.character(1:3), x = rnorm(3),
stringsAsFactors = F)
kknn(formula = y ~ x + state, train = dta,
test = test_dta)
#> Error in valid[, ord, drop = FALSE]: subscript out of bounds
#Also fails
dta <- dta %>% mutate(state = factor(state))
kknn(formula = y ~ x + state,
train = dta,
test = test_dta %>% mutate(state = factor(state)))
#> Error in valid[, ord, drop = FALSE]: subscript out of bounds
#Works
kknn(formula = y ~ x + state,
train = dta,
test = test_dta %>% mutate(state = factor(state, levels = levels(dta$state))))
#>
#> Call:
#> kknn(formula = y ~ x + state, train = dta, test = test_dta %>% mutate(state = factor(state, levels = levels(dta$state))))
#>
#> Response: "continuous"
Created on 2021-01-24 by the reprex package (v0.3.0)