checkmate icon indicating copy to clipboard operation
checkmate copied to clipboard

how to loop over many arguments

Open maxheld83 opened this issue 8 years ago • 10 comments
trafficstars

I want to validate a whole number of arguments, all of which should be, say numeric.

For now, I'm doing this:

paperwidth <- 16
paperheight <- 9

invisible(sapply(
    X = c(paperwidth, paperheight),
    FUN = function(x) {
      checkmate::assert_numeric(
        x = x,
        lower = 0,
        finite = TRUE,
        any.missing = FALSE,
        len = 1,
        null.ok = FALSE
      )
      return(NULL)
    }
  ))

This seems to work alright, I just want to make sure that this doesn't cause any shenanigans behind the scenes with the checkmate.

What would be your recommended way of doing this?

maxheld83 avatar Jul 08 '17 17:07 maxheld83

I think you'll find two problems with your code.

  1. c(paperwidth, paperheight) is going to generate a single vector. I think you mean list(paperwidth, paperheight).
  2. If you catch an error, checkmate won't know which argument created the error. See the following illustration.

I tried this once before. As an example, look what happens when I use lapply to run checks on multiple arguments.

paper_area <- function(paperwidth, paperheight)
{
  sapply(X = list(paperwidth,
                  paperheight),
         FUN = checkmate::assert_numeric,
         lower = 0,
         finite = TRUE,
         any.missing = FALSE,
         len = 1,
         null.ok = FALSE)
  
  paperwidth * paperheight
}

paper_area(NA, 12)

Error in paper_area(NA, 12) : Assertion on 'X[[i]]' failed: Contains missing values.

If you want your checks to maintain their usefulness, you will probably need to use mapply. Notice that I supply the .var.name argument, which means I've had to type them twice, which is a bit of a nuisance, but may be worth it if it consolidates all of your checks into a very small amount of code. This also has the advantage of not requiring that all of the arguments to your assert function be the same. You provide a vector in the ... argument for anything that differs between checks, and anything that is identical on all checks can go in the MoreArgs list.

paper_area <- function(paperwidth, paperheight)
{
  mapply(FUN = checkmate::assert_numeric,
         x = list(paperwidth,
                  paperheight),
         .var.name = c("paperwidth", "paperheight"),
         MoreArgs = list(lower = 0,
                         finite = TRUE,
                         any.missing = FALSE,
                         len = 1,
                         null.ok = FALSE))
  
  paperwidth * paperheight
}

paper_area(NA, 12)

Error in paper_area(NA, 12) : Assertion on 'paperwidth' failed: Contains missing values.

And then you may also do my preferred approach of using an AssertCollection. This can be advantageous to run all of your assertions before reporting failures. This is helpful so long as your assertions are independent. If you're running them in a loop, they must be independent.

paper_area <- function(paperwidth, paperheight)
{
  coll <- checkmate::makeAssertCollection()
  
  mapply(FUN = checkmate::assert_numeric,
         x = list(paperwidth,
                  paperheight),
         .var.name = c("paperwidth", "paperheight"),
         MoreArgs = list(lower = 0,
                         finite = TRUE,
                         any.missing = FALSE,
                         len = 1,
                         null.ok = FALSE,
                         add = coll))
  
  checkmate::reportAssertions(coll)
  
  paperwidth * paperheight
}

paper_area(NA, NA)

Error in paper_area(NA, NA) : 2 assertions failed:

  • Variable 'paperwidth': Contains missing values.
  • Variable 'paperheight': Contains missing values.

nutterb avatar Jul 08 '17 18:07 nutterb

Main problem is the variable lookup. If you can live with a little less informative error messages, the quickest way to do the assertions is with qassertr:

qassertr(list(paperwidth, paperheight), "N1[0,)")

If you need to check just two arguments, I guess copy-pasting the assertion is also fine here for improved readability:

assert_number(paperwidth, lower = 0, finite = TRUE)
assert_number(paperheight, lower = 0, finite = TRUE)

@nutterb already showed a way to use mapply and .var.name.

mllg avatar Jul 12 '17 08:07 mllg

Just wondering: Do you encounter this often? It is not hard to write something like an apply for assertions, e.g.:

aapply = function(fun, formula, ...) {
  fun = match.fun(fun)
  terms = terms(formula)
  vnames = attr(terms, "term.labels")
  ee = attr(terms, ".Environment")
  args = c(list(x = NULL, .var.name = NA_character_), list(...))

  for (vname in vnames) {
    args$x = get(vname, envir = ee)
    args$.var.name = vname
    do.call(fun, args)
  }
  invisible(NULL)
}

paperwidth = 4
paperheight = 5
aapply(assert_number, ~ paperwidth + paperheight, lower = 0)

(first draft, not optimized, shady interface)

mllg avatar Jul 12 '17 08:07 mllg

A formula! That's brilliant!

I can't speak for @maxheld83 , but it's rare that I encounter the situation where I have several arguments that all take the same assertion arguments. More commonly, I'll have several numeric parameters, but they all have a different subset of restrictions.

Take for example a function I've been working on to do power and sample size calculations for a two sample T-test (similar to power.t.test but vectorized). I have the arguments delta, se, alpha, power, and delta0. delta and delta0 share the same set of assertion arguments; so do alpha and power, but at best, I'd have to run aapply twice and proceed as usual to check se.

# 13 lines of code (180 characters)
aapply(assert_numeric,
       ~ delta + mu,
       null.ok = TRUE)

aapply(assert_numeric,
       ~ alpha + power,
       lower = 0,
       upper = 1,
       null.ok = TRUE)

assert_numeric(x = se,
               lower = 0,
               null.ok = TRUE)

Some small modifications allow me to use differing arguments in each check:

checkapply <- function(fun, formula, ..., fixed = list()){
  fun <- match.fun(fun)
  terms <- terms(formula)
  vnames <- attr(terms, "term.labels")
  ee <- attr(terms, ".Environment")
  
  unfixed <- list(...)
  
  for (vname in vnames){
    this_var_arg <- lapply(unfixed,
                           function(x, n) x[[n]] ,
                           vname)
    this_var_arg <- this_var_arg[!vapply(this_var_arg, is.null, logical(1))]
    
    xtmp[!vapply(xtmp, is.null, logical(1))]
    args <- c(list(x = get(vname, envir = ee),
                   .var.name = vname),
              this_var_arg,
              fixed)
    do.call(fun, args)
  }
  invisible(NULL)
}

Which results in (full comparison)

# 5 lines of code (172 characters)
checkapply(assert_numeric,
           ~ delta + mu0 + alpha + power + se,
           lower = list(alpha = 0, power = 0, se = 0),
           upper = list(alpha = 1, power = 1),
           fixed = list(null.ok = TRUE))

Admittedly, the compactness is mostly a result of coding style, but it's still fewer characters. But note, also, that adding arguments has larger impacts on aapply than on checkapply. For instance, adding an assertCollection (via add = coll) adds 30 characters to the aapply assertions, but only 10 to checkapply.

checkapply requires about twice as much time to run as aapply (143 vs 81 microseconds, respectively; compared to 16 microseconds to run each assertion individually.)

I can see the allure of the more compact representations, however, I'm not convinced that I would actually use them. I tend to prefer to the more computationally efficient approaches over compactness. Truthfully, after I write my assertions, I almost never go back to look at them because checkmate works so well that there are almost never any problems to address. But I might be a rarity in my willingness to write that much code in order to maintain a 120 microsecond advantage.

nutterb avatar Jul 12 '17 11:07 nutterb

thanks so much for the thoughtful suggestions @mllg and @nutterb.

@mllg I agree with @nutterb; I haven't encountered this often, thought it sometimes happens when I have functions with lots of configuration arguments (say, for a printing function or something like that).

I mainly wanted to make sure that by applying over it, I wouldn't mess up something in checkmate, and was going to implement @nutterb's suggestion using mapply.

That said, I'd really love to see some syntactic sugar in checkmate that would make this easier, and consistent – it'd just be one less thing to think about.

Using checkmate has been a huge boon for me; just let's me concentrate completely on the content of the assertions and get on my way.

(As I mentioned in #92, I'm also using checkmate() to shoehorn validation into S3 classes).

maxheld83 avatar Jul 12 '17 11:07 maxheld83

one more application of looping assertions comes to mind, though it's slightly different: When users provide a >1d vector or dataframe as an argument, it can sometimes be helpful to point the user to the row and/or column of the offending data.

I guess this is a different matter though, because the vector or df would be one argument, not several.

maxheld83 avatar Jul 12 '17 11:07 maxheld83

Do you have interest in including this in checkmate? If so, I'll prepare a pull request. (based on massert)

I'm not advocating either way. I just want to know if I should plan on maintaining in my own codebase.

nutterb avatar Sep 20 '17 12:09 nutterb

I have interest in including such a function in the package. Regarding varying parameters for checks, what do you think of using the vector recycling of mapply?

aapply = function(fun, formula, ..., fixed = list()) {
  fun = match.fun(fun)
  terms = terms(formula)
  vnames = attr(terms, "term.labels")
  ee = attr(terms, ".Environment")

  dots = list(...)
  dots$.var.name = vnames
  dots$x = unname(mget(vnames, envir = ee))
  .mapply(fun, dots, MoreArgs = fixed)

  invisible(NULL)
}

paperwidth = 4
paperheight = 5
aapply(assert_number, . ~ paperwidth + paperheight, lower = 0)
aapply(assert_number, . ~ paperwidth + paperheight, lower = list(4, 5))

I have not benchmarked it yet, but it should be really fast.

mllg avatar Sep 25 '17 07:09 mllg

I had prepared a long argument for why I preferred massert over aapply, and in the process of writing it, convinced that aapply is better. So, my arguments against my own work follow. :)

My one comment on your aapply is that it employs recycling to fill out the vectors. Using the following variables:

paperwidth = 4  # must be >= 4
paperheight = 5 # must be >= 5
x = 15          # must be <= 20
y = 18          # must be <= 20, null.ok = TRUE

aapply requires that each argument used by any of the assertions be provided for all of the variables. (184 characters)

aapply(assert_numeric,
       ~ paperwidth + paperheight + x + y,
       lower = c(4, 5, -Inf, -Inf),
       upper = c(Inf, Inf, 20, 20),
       null.ok = c(FALSE, FALSE, FALSE, TRUE))

The equivalent massert can be written (194 characters)

massert(~ paperwidth + paperheight + x + y,
        assert_numeric,
        lower = list(paperwidth = 4, paperheight = 5),
        upper = list(x = 15, y = 18),
        null.ok = list(y = TRUE))

On the other hand, aapply is an order of magnitude faster than massert. The thing I'm most likely to get hung up on when trying to use aapply is remembering when an argument takes a NULL by default (null.ok) or an Inf (upper, lower, etc). But I can get used to that.

nutterb avatar Sep 25 '17 10:09 nutterb

EDIT: was too excited. my comment doesn't help with repeating the same rule. looks something like qassertm(paperwidth='N[4,15]',paperheight='N[5,18]') EDIT2: see e.g. qassert_all('N[0,]', paperwidth, paperheight).

I'm trying

qassertm <- function(...) {
   args<-list(...)
   for(var in names(args)) 
      eval(substitute(checkmate::qassert(x,spec),
                      list(x=as.name(var),spec=args[[var]])), 
              envir=parent.frame())
} 

and using like

test_checkargs <- function(x, y) {qassertm(x='b',y='n'); print(y)}


test_checkargs(T, 1)
# [1] 1

test_checkargs(T, T)
# Error in eval(substitute(checkmate::qassert(x, spec), list(x = as.name(var),  : 
#  Assertion on 'y' failed. Must be of class 'numeric', not 'logical'.

struggled for a bit on how to write it and don't have any insight into how performant it is.

WillForan avatar Jul 22 '22 02:07 WillForan