miceRanger icon indicating copy to clipboard operation
miceRanger copied to clipboard

Imputation of dataframe with order factors fails

Open sibipx opened this issue 2 years ago • 0 comments

Imputing a dataframe with ordered factors gives error. See example below on diamonds dataset from ggplot2.

I am not sure, but the problem seems to be when checking classes. It seems that regression models are assigned to ordered factors (they are not seen as factor)

  newClasses <- sapply(dat[, vara, with = FALSE], class)
  modelTypes <- ifelse(newClasses[varn] == "factor", "Classification", 
    "Regression")

It would be more sensible to treat ordered factors as factors (multinomial). Thanks!

Example:

> library(miceRanger)
> library(ggplot2)
> 
> data(diamonds)
> 
> diamonds_miss <- amputeData(diamonds, perc = 0.3)
> 
> str(diamonds_miss)
Classes ‘data.table’ and 'data.frame':	53940 obs. of  10 variables:
 $ carat  : num  0.23 NA 0.23 0.29 NA 0.24 0.24 NA NA 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 NA 2 NA NA 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 NA NA 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 NA 5 NA 2 6 7 3 4 5 ...
 $ depth  : num  61.5 59.8 NA NA 63.3 62.8 NA 61.9 NA 59.4 ...
 $ table  : num  55 61 NA 58 NA 57 57 55 61 61 ...
 $ price  : int  326 326 327 334 335 336 336 337 337 NA ...
 $ x      : num  3.95 NA NA 4.2 4.34 NA NA 4.07 3.87 NA ...
 $ y      : num  3.98 3.84 4.07 4.23 NA 3.96 NA 4.11 NA 4.05 ...
 $ z      : num  NA 2.31 NA 2.63 2.75 2.48 2.47 2.53 2.49 NA ...
 - attr(*, ".internal.selfref")=<externalptr> 
> 
> is.factor(diamonds_miss$cut)
[1] TRUE
> class(diamonds_miss$cut)
[1] "ordered" "factor" 
> miceRanger::miceRanger(diamonds_miss, m = 2, maxiter = 2,
+                        returnModels = TRUE,
+                        verbose = TRUE)

Process started at 2022-05-19 17:39:38 
data.table 1.14.0 using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com

dataset 1 
iteration 1 	 | carat | cut
dataset 2 
iteration 1 	 | carat | cutError in miceRanger::miceRanger(diamonds_miss, m = 2, maxiter = 2, returnModels = TRUE,  : 
  Evaluation failed with error <Error in get.knnx(data, query, k, algorithm): Data non-numeric
>. This is probably our fault - please open an issue at https://github.com/FarrellDay/miceRanger/issues with a reproduceable example.
> miceRanger::miceRanger(data.table(diamonds_miss), m = 2, maxiter = 2,
+                        returnModels = TRUE,
+                        verbose = TRUE)

Process started at 2022-05-19 17:41:29 

dataset 1 
iteration 1 	 | carat | cut
dataset 2 
iteration 1 	 | carat | cutError in miceRanger::miceRanger(data.table(diamonds_miss), m = 2, maxiter = 2,  : 
  Evaluation failed with error <Error in get.knnx(data, query, k, algorithm): Data non-numeric
>. This is probably our fault - please open an issue at https://github.com/FarrellDay/miceRanger/issues with a reproduceable example.

sibipx avatar May 19 '22 15:05 sibipx