validate icon indicating copy to clipboard operation
validate copied to clipboard

Question: searching for duplicated and empty records

Open johanneswerner opened this issue 11 months ago • 1 comments

How can I use validate to check for

  1. duplicated rows
  2. empty rows
  3. additional (unwanted) columns?

Thank you very much!

johanneswerner avatar Mar 16 '24 08:03 johanneswerner

I don't know for the first two points but to check whether you have unwanted columns, you can use names(.) in the rule:

library(validate)

test <- data.frame(var1 = c(1, NA), var2 = c(2, NA), foobar = c("a", "b"), var1.x = 1:2)

rules <- validator(
    length(grep("\\.x$", names(.))) == 0,
    ! "foobar" %in% names(.)
)

confront(test, rules) |> 
    summary()
#>   name items passes fails nNA error warning
#> 1   V1     1      0     1   0 FALSE   FALSE
#> 2   V2     1      0     1   0 FALSE   FALSE
#>                               expression
#> 1 length(grep("\\\\.x$", names(.))) == 0
#> 2               !"foobar" %vin% names(.)

I often use the one that checks that no variable name ends with .x since it's usually the sign that a join produced duplicated variable.

etiennebacher avatar May 14 '24 11:05 etiennebacher