janitor icon indicating copy to clipboard operation
janitor copied to clipboard

remove_empty: select empty columns

Open luizaandrade opened this issue 3 years ago • 2 comments

I have come across a number of cases where I'd like to remove rows (column) that are empty for some selected columns (rows), but not all. The most straightforward example is that of a data table that has non-empty keys in all rows, but some rows have no information in the other columns. Like this:

key var1 var2 var3
1 10 "foo" TRUE
2 NA NA NA
3 32 "bar" TRUE

Although the second row is not completely empty, it would be useful to be able to remove it quickly and neatly.

luizaandrade avatar Feb 08 '23 20:02 luizaandrade

Would the ideal implementation of this be a tidyselect ... for the remove_empty function, so that you could say remove_empty(dat,,,,-key) ? That seems like a worthwhile addition to me, with little downside.

In the meantime would this work?

dat <- data.frame(
  key = 1:3,
  var1 = c(10, NA, 32),
  var2 = c("foo", NA, "bar"),
  var3 = c(TRUE, NA, FALSE)
)

dat %>%
  remove_empty("rows", cutoff = 0.7)

sfirke avatar Feb 08 '23 20:02 sfirke

Great! Yes, the tidyselect is exactly what I had in mind. And thanks, the cutoff options does the trick for now and is a very nice feature.

luizaandrade avatar Feb 08 '23 20:02 luizaandrade