kknn icon indicating copy to clipboard operation
kknn copied to clipboard

Error When Character Level In Train But Not Test

Open mgoplerud opened this issue 4 years ago • 0 comments

Hello! I think there is a bug where if the testset does not contain levels of a character variable that are in the trainset, the model will not run.

library(kknn)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

#Character level mis-alignment
dta <- data.frame(
  state = as.character(sample(1:10, 100, replace = T)),
  x = rnorm(100), stringsAsFactors = F)
dta$y <- rnorm(100)

test_dta <- data.frame(state = as.character(1:3), x = rnorm(3),
          stringsAsFactors = F)

kknn(formula = y ~ x + state, train = dta,
     test = test_dta)
#> Error in valid[, ord, drop = FALSE]: subscript out of bounds


#Also fails
dta <- dta %>% mutate(state = factor(state))
kknn(formula = y ~ x + state, 
     train = dta,
     test = test_dta %>% mutate(state = factor(state)))
#> Error in valid[, ord, drop = FALSE]: subscript out of bounds
#Works
kknn(formula = y ~ x + state, 
     train = dta,
     test = test_dta %>% mutate(state = factor(state, levels = levels(dta$state))))
#> 
#> Call:
#> kknn(formula = y ~ x + state, train = dta, test = test_dta %>%     mutate(state = factor(state, levels = levels(dta$state))))
#> 
#> Response: "continuous"

Created on 2021-01-24 by the reprex package (v0.3.0)

mgoplerud avatar Jan 24 '21 17:01 mgoplerud