naniar icon indicating copy to clipboard operation
naniar copied to clipboard

```replace_with_na_all``` turns factors into integers.

Open themichjam opened this issue 3 years ago • 3 comments

When I try to use replace_with_na_all to clean up some factors, the resulting columns are turned into integers. See below for a simple example with Iris - it would be wonderful if that could be fixed? Obviously, I could convert factors to characters and back, but that defeats the purpose of this package a little bit.

#Fails
iris %>% naniar::replace_with_na_all(condition = ~.x %in% na_strings)

themichjam avatar Jul 25 '21 12:07 themichjam

Hi there!

Thanks for reporting this - you're right, this is a pain and not what we want naniar to do!

I'll take a look at this when I'm doing the next release, which should be by the end of August.

Cheers!

njtierney avatar Jul 27 '21 02:07 njtierney

That would be amazing, thank you!

themichjam avatar Jul 28 '21 18:07 themichjam

Tried to create a pull request for this, but thought putting here would also help others looking! Below code uses NHANES dataset as example, but latter part of the code seems to do the job of replace_all_na_all without turning factors into integers. I was thinking this could help in the update?

install.packages("NHANES")
library(reprex)
library(NHANES)
library(dplyr)

# make a selection
nhanes_long <- NHANES %>% select(Age,AgeDecade,Education,Poverty,Work,LittleInterest,Depressed,BMI,Pulse,BPSysAve,BPDiaAve,DaysPhysHlthBad,PhysActiveDays)

# select 500 random indices
rand_ind <- sample(1:nrow(nhanes_long),500)
nhanes <- nhanes_long[rand_ind,]

summary(nhanes_long)


# convert unwanted levels to NA
# write out all the offending strings of different NAs
#used
na_strings <- c("None",
                "Some College",
                "Several")


# before replacement
table(nhanes$Education)

# replace unwanted answers/typos with NA
nhanes <- nhanes %>%
  mutate(across(everything(), 
                ~ replace(., . %in% c(na_strings), NA_character_))) %>% 
  type.convert(as.is = TRUE)

themichjam avatar Oct 16 '21 21:10 themichjam

This no longer seems to be an issue due to internal changes in naniar :)

#Fails
na_strings <- "setosa"
iris %>% naniar::replace_with_na_all(condition = ~.x %in% na_strings)
#> Error in iris %>% naniar::replace_with_na_all(condition = ~.x %in% na_strings): could not find function "%>%"

Created on 2023-04-10 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.3 (2023-03-15)
#>  os       macOS Ventura 13.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Australia/Hobart
#>  date     2023-04-10
#>  pandoc   2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.2.0)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.0)
#>  evaluate      0.20    2023-01-17 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.6.1   2023-02-06 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.0)
#>  knitr         1.42    2023-01-25 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.2.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.2.0)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.0)
#>  rlang         1.1.0   2023-03-14 [1] CRAN (R 4.2.0)
#>  rmarkdown     2.20    2023-01-19 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  styler        1.9.0   2023-01-15 [1] CRAN (R 4.2.0)
#>  vctrs         0.6.1   2023-03-22 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.37    2023-01-31 [1] CRAN (R 4.2.0)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

njtierney avatar Apr 10 '23 03:04 njtierney