naniar
naniar copied to clipboard
replace_with_na_all turns factors to integers
(moved over here from https://stackoverflow.com/questions/58621567/naniarreplace-with-na-all-changes-factor-variables-to-integers)
I have a dataset where some missing values are coded as -99, and tried to use the naniar function replace_with_na_all to replace those values with NA. The function does this, but it also seems to convert my factor variables to integers, thereby losing the name of the factors.
This happens whether the factor itself already has some missing values or not, which you can see in the example below (in tibble1 the factor has a missing value from the start, in tibble2 it does not).
library(tidyverse)
library(naniar)
# Example factor with missing values
tibble1 <- tribble(
~x, ~y,
"a", 1,
-99, 2,
"c", -99
)
tibble1$x <- as.factor(tibble1$x)
levels(tibble1$x) <- list("A" = "a",
"C" = "c")
# Showing original tibble and then after replace_with_na_all is used
tibble1
tibble1 %>% naniar::replace_with_na_all(condition = ~.x == -99)
# Example factor without missing values
tibble2 <- tribble(
~x, ~y,
"a", 1,
"b", 2,
"c", -99
)
tibble2$x <- as.factor(tibble2$x)
levels(tibble2$x) <- list("A" = "a",
"B" = "b",
"C" = "c")
# Showing original tibble and then after replace_with_na_all is used
tibble2
tibble2 %>% naniar::replace_with_na_all(condition = ~.x == -99)
Not sure if this is supposed to happen, but if it is, it was not clear to me from reading the documentation.
Thanks for posting this issue!
Sorry this has caused some hassles for you, I don't think this is desireable behaviour - i'll flag this and try and fix this in the next release of naniar.
Thank you very much for taking the time to post a reprex!
FWIW I have had this problem too -- will post some code in a moment which is a partial fix.
You've probably long since solved this problem, but if not here is some code.
It writes NA where there is an empty cell; you would just change "" to "-99".
NB it ignores POSIX columns because they cause an error. (I guess to fix that you would have to coerce the POSIX cols into a different class, replace the empty cells then coerce back to POSIX?)
library(tidyverse)
library(lubridate)
df <- data.frame(dates = as.POSIXlt.Date(c(12345, 23456, 34567)), alpha=2:4, beta=c("", "", 6), gamma=c(7:8,""),
delta = c("", FALSE, TRUE), chars = c("a", "bb", ""))
df <- df %>%
mutate_if(purrr::negate(is.POSIXt), ~ na_if(., ""))
I believe this had been resolved with some changes to internals
library(tidyverse)
library(naniar)
# Example factor with missing values
tibble1 <- tribble(
~x, ~y,
"a", 1,
"-99", 2,
"c", -99
)
tibble1$x <- as.factor(tibble1$x)
levels(tibble1$x) <- list("A" = "a",
"C" = "c",
"-99" = "-99")
# Showing original tibble and then after replace_with_na_all is used
tibble1
#> # A tibble: 3 × 2
#> x y
#> <fct> <dbl>
#> 1 A 1
#> 2 -99 2
#> 3 C -99
tibble1 %>% naniar::replace_with_na_all(condition = ~.x == -99)
#> # A tibble: 3 × 2
#> x y
#> <fct> <dbl>
#> 1 A 1
#> 2 <NA> 2
#> 3 C NA
Created on 2023-04-10 with reprex v2.0.2
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.3 (2023-03-15)
#> os macOS Ventura 13.2
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Australia/Hobart
#> date 2023-04-10
#> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)
#> broom 1.0.3 2023-01-25 [1] CRAN (R 4.2.0)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.0)
#> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.0)
#> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.0)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.0)
#> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.0)
#> dbplyr 2.3.0 2023-01-16 [1] CRAN (R 4.2.0)
#> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.0)
#> dplyr * 1.1.1 2023-03-22 [1] CRAN (R 4.2.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
#> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.0)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
#> forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.2.0)
#> fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.0)
#> gargle 1.3.0 2023-01-30 [1] CRAN (R 4.2.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
#> ggplot2 * 3.4.1 2023-02-10 [1] CRAN (R 4.2.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> googledrive 2.0.0 2021-07-08 [1] CRAN (R 4.2.0)
#> googlesheets4 1.0.1 2022-08-13 [1] CRAN (R 4.2.0)
#> gtable 0.3.1 2022-09-01 [1] CRAN (R 4.2.0)
#> haven 2.5.1 2022-08-22 [1] CRAN (R 4.2.0)
#> hms 1.1.2 2022-08-19 [1] CRAN (R 4.2.0)
#> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.0)
#> httr 1.4.4 2022-08-17 [1] CRAN (R 4.2.0)
#> jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.0)
#> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.0)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0)
#> lubridate 1.9.1 2023-01-24 [1] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> modelr 0.1.10 2022-11-11 [1] CRAN (R 4.2.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
#> naniar * 1.0.0.9000 2023-04-10 [1] local
#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.2.0)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> readr * 2.1.3 2022-10-01 [1] CRAN (R 4.2.0)
#> readxl 1.4.1 2022-08-17 [1] CRAN (R 4.2.0)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0)
#> rlang 1.1.0 2023-03-14 [1] CRAN (R 4.2.0)
#> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.0)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0)
#> rvest 1.0.3 2022-08-19 [1] CRAN (R 4.2.0)
#> scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.0)
#> stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.2.0)
#> styler 1.9.0 2023-01-15 [1] CRAN (R 4.2.0)
#> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.2.0)
#> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.2.0)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0)
#> tidyverse * 1.3.2 2022-07-18 [1] CRAN (R 4.2.0)
#> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.0)
#> tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.0)
#> vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.2.0)
#> visdat 0.6.0 2023-02-02 [1] local
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
#> xfun 0.37 2023-01-31 [1] CRAN (R 4.2.0)
#> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────