vetr icon indicating copy to clipboard operation
vetr copied to clipboard

Return messages for each vetting condition failed

Open franknarf1 opened this issue 6 years ago • 1 comments

I am vetting a vector and am interested in seeing a "full report" when it fails:

library(vetr)
library(data.table)

date8_toIDate = function(x) as.IDate(as.character(x), format = "%Y%m%d")

DATE8 = vet_token( INT && !is.na(date8_toIDate(.)) )

x = c(20010102L, 20010101L, 20010100L)
y = replace(x, 2, NA)

vet(DATE8, x)
# [1] "`!is.na(date8_toIDate(x))` is not all TRUE (contains non-TRUE values)"
vet(DATE8, y)
# [1] "`y` should not contain NAs, but does"

So the format is an 8-digit number representing a date. For the vector y, I want to see both that it has NAs and also that its non-NAs fail my conversion test, like...

multivet(DATE8, y)
# [1] "`y` should not contain NAs, but does"    
# [1] "`!is.na(date8_toIDate(y))` is not all TRUE (contains non-TRUE values)"

Background. You could argue that I can fix y's NAs; rerun; and then catch the other condition. The problem is that I am not passing these interactively. Instead, someone else has a non-R script for pulling from various databases and manipulating data in an attempt to meet my documented vetting conditions. Running their input-generating program is time-consuming, so I'd like them to get a full report of input problems whenever any are present so they can fix them all at once.

This may be outside of what you had in mind for the package, but it looks somewhat close to the already-included functionality.

EDIT: Thinking about this more, the behavior I'm asking for may be ill defined. I guess there would need to be a rule that a particular sequence of conditions in the token is traversed. So in AA && BB && CC, if AA fails, it will only examine BB on those where AA passed; and where BB fails on some, again it would test CC only where it passed.

franknarf1 avatar Nov 23 '17 00:11 franknarf1

Right now vetr bails out as soon as it determines that an object cannot possibly meet the vetting token, which in the case of && combinations is possibly as early as the first token failing. This is partly for speed, and also partly for implementation simplicity. I can consider a mode that will evaluate all tokens and track all failures, but it will probably be a long time before I get to it.

A sub-optimal workaround would be to have individual vet calls to each of the individual tokens, e.g.:

vet.res <- list(
  vet(INT, y, stop=FALSE),
  vet(!is.na(date8_toIDate(.)), y, stop=FALSE)
)
vet.fail <- !sapply(vet.res, isTRUE)
if(any(vet.fail)) vet.res[vet.fail] else TRUE

Am I understanding what you are after correctly?

brodieG avatar Nov 23 '17 14:11 brodieG