validate
validate copied to clipboard
Question: searching for duplicated and empty records
How can I use validate
to check for
- duplicated rows
- empty rows
- additional (unwanted) columns?
Thank you very much!
I don't know for the first two points but to check whether you have unwanted columns, you can use names(.)
in the rule:
library(validate)
test <- data.frame(var1 = c(1, NA), var2 = c(2, NA), foobar = c("a", "b"), var1.x = 1:2)
rules <- validator(
length(grep("\\.x$", names(.))) == 0,
! "foobar" %in% names(.)
)
confront(test, rules) |>
summary()
#> name items passes fails nNA error warning
#> 1 V1 1 0 1 0 FALSE FALSE
#> 2 V2 1 0 1 0 FALSE FALSE
#> expression
#> 1 length(grep("\\\\.x$", names(.))) == 0
#> 2 !"foobar" %vin% names(.)
I often use the one that checks that no variable name ends with .x
since it's usually the sign that a join produced duplicated variable.