naniar icon indicating copy to clipboard operation
naniar copied to clipboard

replace_with_na_all turns factors to integers

Open vbrazao opened this issue 5 years ago • 3 comments

(moved over here from https://stackoverflow.com/questions/58621567/naniarreplace-with-na-all-changes-factor-variables-to-integers)

I have a dataset where some missing values are coded as -99, and tried to use the naniar function replace_with_na_all to replace those values with NA. The function does this, but it also seems to convert my factor variables to integers, thereby losing the name of the factors.

This happens whether the factor itself already has some missing values or not, which you can see in the example below (in tibble1 the factor has a missing value from the start, in tibble2 it does not).

library(tidyverse)
library(naniar)

# Example factor with missing values
tibble1 <- tribble(
  ~x, ~y,
  "a", 1,
  -99, 2,
  "c", -99
)

tibble1$x <- as.factor(tibble1$x) 


levels(tibble1$x) <- list("A" = "a",
                          "C" = "c")

# Showing original tibble and then after replace_with_na_all is used
tibble1
tibble1 %>% naniar::replace_with_na_all(condition = ~.x == -99) 




# Example factor without missing values
tibble2 <- tribble(
  ~x, ~y,
  "a", 1,
  "b", 2,
  "c", -99
)

tibble2$x <- as.factor(tibble2$x) 


levels(tibble2$x) <- list("A" = "a",
                          "B" = "b",
                          "C" = "c")

# Showing original tibble and then after replace_with_na_all is used
tibble2
tibble2 %>% naniar::replace_with_na_all(condition = ~.x == -99)  

Not sure if this is supposed to happen, but if it is, it was not clear to me from reading the documentation.

vbrazao avatar Nov 28 '19 16:11 vbrazao

Thanks for posting this issue!

Sorry this has caused some hassles for you, I don't think this is desireable behaviour - i'll flag this and try and fix this in the next release of naniar.

Thank you very much for taking the time to post a reprex!

njtierney avatar Dec 02 '19 05:12 njtierney

FWIW I have had this problem too -- will post some code in a moment which is a partial fix.

antondutoit avatar Sep 24 '20 01:09 antondutoit

You've probably long since solved this problem, but if not here is some code.

It writes NA where there is an empty cell; you would just change "" to "-99".

NB it ignores POSIX columns because they cause an error. (I guess to fix that you would have to coerce the POSIX cols into a different class, replace the empty cells then coerce back to POSIX?)

library(tidyverse)
library(lubridate)

df <- data.frame(dates = as.POSIXlt.Date(c(12345, 23456, 34567)), alpha=2:4, beta=c("", "", 6), gamma=c(7:8,""), 
delta = c("", FALSE, TRUE), chars = c("a", "bb", ""))

df <- df %>%
    mutate_if(purrr::negate(is.POSIXt), ~ na_if(., ""))

antondutoit avatar Sep 24 '20 06:09 antondutoit

I believe this had been resolved with some changes to internals

library(tidyverse)
library(naniar)

# Example factor with missing values
tibble1 <- tribble(
  ~x, ~y,
  "a", 1,
  "-99", 2,
  "c", -99
)

tibble1$x <- as.factor(tibble1$x) 

levels(tibble1$x) <- list("A" = "a",
                          "C" = "c",
                          "-99" = "-99")

# Showing original tibble and then after replace_with_na_all is used
tibble1
#> # A tibble: 3 × 2
#>   x         y
#>   <fct> <dbl>
#> 1 A         1
#> 2 -99       2
#> 3 C       -99
tibble1 %>% naniar::replace_with_na_all(condition = ~.x == -99) 
#> # A tibble: 3 × 2
#>   x         y
#>   <fct> <dbl>
#> 1 A         1
#> 2 <NA>      2
#> 3 C        NA

Created on 2023-04-10 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.3 (2023-03-15)
#>  os       macOS Ventura 13.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Australia/Hobart
#>  date     2023-04-10
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version    date (UTC) lib source
#>  assertthat      0.2.1      2019-03-21 [1] CRAN (R 4.2.0)
#>  backports       1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
#>  broom           1.0.3      2023-01-25 [1] CRAN (R 4.2.0)
#>  cellranger      1.1.0      2016-07-27 [1] CRAN (R 4.2.0)
#>  cli             3.6.0      2023-01-09 [1] CRAN (R 4.2.0)
#>  colorspace      2.1-0      2023-01-23 [1] CRAN (R 4.2.0)
#>  crayon          1.5.2      2022-09-29 [1] CRAN (R 4.2.0)
#>  DBI             1.1.3      2022-06-18 [1] CRAN (R 4.2.0)
#>  dbplyr          2.3.0      2023-01-16 [1] CRAN (R 4.2.0)
#>  digest          0.6.31     2022-12-11 [1] CRAN (R 4.2.0)
#>  dplyr         * 1.1.1      2023-03-22 [1] CRAN (R 4.2.0)
#>  ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate        0.20       2023-01-17 [1] CRAN (R 4.2.0)
#>  fansi           1.0.4      2023-01-22 [1] CRAN (R 4.2.0)
#>  fastmap         1.1.0      2021-01-25 [1] CRAN (R 4.2.0)
#>  forcats       * 1.0.0      2023-01-29 [1] CRAN (R 4.2.0)
#>  fs              1.6.1      2023-02-06 [1] CRAN (R 4.2.0)
#>  gargle          1.3.0      2023-01-30 [1] CRAN (R 4.2.0)
#>  generics        0.1.3      2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2       * 3.4.1      2023-02-10 [1] CRAN (R 4.2.0)
#>  glue            1.6.2      2022-02-24 [1] CRAN (R 4.2.0)
#>  googledrive     2.0.0      2021-07-08 [1] CRAN (R 4.2.0)
#>  googlesheets4   1.0.1      2022-08-13 [1] CRAN (R 4.2.0)
#>  gtable          0.3.1      2022-09-01 [1] CRAN (R 4.2.0)
#>  haven           2.5.1      2022-08-22 [1] CRAN (R 4.2.0)
#>  hms             1.1.2      2022-08-19 [1] CRAN (R 4.2.0)
#>  htmltools       0.5.4      2022-12-07 [1] CRAN (R 4.2.0)
#>  httr            1.4.4      2022-08-17 [1] CRAN (R 4.2.0)
#>  jsonlite        1.8.4      2022-12-06 [1] CRAN (R 4.2.0)
#>  knitr           1.42       2023-01-25 [1] CRAN (R 4.2.0)
#>  lifecycle       1.0.3      2022-10-07 [1] CRAN (R 4.2.0)
#>  lubridate       1.9.1      2023-01-24 [1] CRAN (R 4.2.0)
#>  magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.2.0)
#>  modelr          0.1.10     2022-11-11 [1] CRAN (R 4.2.0)
#>  munsell         0.5.0      2018-06-12 [1] CRAN (R 4.2.0)
#>  naniar        * 1.0.0.9000 2023-04-10 [1] local
#>  pillar          1.8.1      2022-08-19 [1] CRAN (R 4.2.0)
#>  pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         * 1.0.1      2023-01-10 [1] CRAN (R 4.2.0)
#>  R.cache         0.16.0     2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3     1.8.2      2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo            1.25.0     2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils         2.12.2     2022-11-11 [1] CRAN (R 4.2.0)
#>  R6              2.5.1      2021-08-19 [1] CRAN (R 4.2.0)
#>  readr         * 2.1.3      2022-10-01 [1] CRAN (R 4.2.0)
#>  readxl          1.4.1      2022-08-17 [1] CRAN (R 4.2.0)
#>  reprex          2.0.2      2022-08-17 [1] CRAN (R 4.2.0)
#>  rlang           1.1.0      2023-03-14 [1] CRAN (R 4.2.0)
#>  rmarkdown       2.20       2023-01-19 [1] CRAN (R 4.2.0)
#>  rstudioapi      0.14       2022-08-22 [1] CRAN (R 4.2.0)
#>  rvest           1.0.3      2022-08-19 [1] CRAN (R 4.2.0)
#>  scales          1.2.1      2022-08-20 [1] CRAN (R 4.2.0)
#>  sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi         1.7.12     2023-01-11 [1] CRAN (R 4.2.0)
#>  stringr       * 1.5.0      2022-12-02 [1] CRAN (R 4.2.0)
#>  styler          1.9.0      2023-01-15 [1] CRAN (R 4.2.0)
#>  tibble        * 3.2.1      2023-03-20 [1] CRAN (R 4.2.0)
#>  tidyr         * 1.3.0      2023-01-24 [1] CRAN (R 4.2.0)
#>  tidyselect      1.2.0      2022-10-10 [1] CRAN (R 4.2.0)
#>  tidyverse     * 1.3.2      2022-07-18 [1] CRAN (R 4.2.0)
#>  timechange      0.2.0      2023-01-11 [1] CRAN (R 4.2.0)
#>  tzdb            0.3.0      2022-03-28 [1] CRAN (R 4.2.0)
#>  utf8            1.2.3      2023-01-31 [1] CRAN (R 4.2.0)
#>  vctrs           0.6.1      2023-03-22 [1] CRAN (R 4.2.0)
#>  visdat          0.6.0      2023-02-02 [1] local
#>  withr           2.5.0      2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun            0.37       2023-01-31 [1] CRAN (R 4.2.0)
#>  xml2            1.3.3      2021-11-30 [1] CRAN (R 4.2.0)
#>  yaml            2.3.7      2023-01-23 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

njtierney avatar Apr 10 '23 03:04 njtierney