miceRanger
miceRanger copied to clipboard
Imputation of dataframe with order factors fails
Imputing a dataframe with ordered factors gives error. See example below on diamonds dataset from ggplot2.
I am not sure, but the problem seems to be when checking classes. It seems that regression models are assigned to ordered factors (they are not seen as factor)
newClasses <- sapply(dat[, vara, with = FALSE], class)
modelTypes <- ifelse(newClasses[varn] == "factor", "Classification",
"Regression")
It would be more sensible to treat ordered factors as factors (multinomial). Thanks!
Example:
> library(miceRanger)
> library(ggplot2)
>
> data(diamonds)
>
> diamonds_miss <- amputeData(diamonds, perc = 0.3)
>
> str(diamonds_miss)
Classes ‘data.table’ and 'data.frame': 53940 obs. of 10 variables:
$ carat : num 0.23 NA 0.23 0.29 NA 0.24 0.24 NA NA 0.23 ...
$ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 NA 2 NA NA 3 1 3 ...
$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 NA NA 5 ...
$ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 NA 5 NA 2 6 7 3 4 5 ...
$ depth : num 61.5 59.8 NA NA 63.3 62.8 NA 61.9 NA 59.4 ...
$ table : num 55 61 NA 58 NA 57 57 55 61 61 ...
$ price : int 326 326 327 334 335 336 336 337 337 NA ...
$ x : num 3.95 NA NA 4.2 4.34 NA NA 4.07 3.87 NA ...
$ y : num 3.98 3.84 4.07 4.23 NA 3.96 NA 4.11 NA 4.05 ...
$ z : num NA 2.31 NA 2.63 2.75 2.48 2.47 2.53 2.49 NA ...
- attr(*, ".internal.selfref")=<externalptr>
>
> is.factor(diamonds_miss$cut)
[1] TRUE
> class(diamonds_miss$cut)
[1] "ordered" "factor"
> miceRanger::miceRanger(diamonds_miss, m = 2, maxiter = 2,
+ returnModels = TRUE,
+ verbose = TRUE)
Process started at 2022-05-19 17:39:38
data.table 1.14.0 using 6 threads (see ?getDTthreads). Latest news: r-datatable.com
dataset 1
iteration 1 | carat | cut
dataset 2
iteration 1 | carat | cutError in miceRanger::miceRanger(diamonds_miss, m = 2, maxiter = 2, returnModels = TRUE, :
Evaluation failed with error <Error in get.knnx(data, query, k, algorithm): Data non-numeric
>. This is probably our fault - please open an issue at https://github.com/FarrellDay/miceRanger/issues with a reproduceable example.
> miceRanger::miceRanger(data.table(diamonds_miss), m = 2, maxiter = 2,
+ returnModels = TRUE,
+ verbose = TRUE)
Process started at 2022-05-19 17:41:29
dataset 1
iteration 1 | carat | cut
dataset 2
iteration 1 | carat | cutError in miceRanger::miceRanger(data.table(diamonds_miss), m = 2, maxiter = 2, :
Evaluation failed with error <Error in get.knnx(data, query, k, algorithm): Data non-numeric
>. This is probably our fault - please open an issue at https://github.com/FarrellDay/miceRanger/issues with a reproduceable example.