epinowcast icon indicating copy to clipboard operation
epinowcast copied to clipboard

Refine input checking

Open seabbs opened this issue 3 years ago • 4 comments

Currently, the package uses a series of custom check_ functions to check the inputs of user-facing functions. These need to be refined and iterated on so that they are more clearly implemented and cover the full range of inputs.

There are quite a few approaches for doing this (with the simplest being the currently used if and stop approach). Ideally we are looking for something that is simple to implement/maintain, is clear to readers of the code, and gives very helpful error/warning messages to guide users.

Some suggested alternatives and discussion points (see here for the original discussion:

From @pearsonca

For error-checking / messages, I think this generally reads more clearly (comments for emphasis - they're unnecessary imo, especially if this idiom is repeated through a library)

stopifnot( # ... we're doing some error checking
  "checking this feature" = ..., # against some logical expression
  ... # repeat for all the desired checks
)

Aside: I am on-board with the don't-if-else-your-errors mentality, just not necessarily the return() bit. There are two useful idioms for returns in branching code:

  • pre declare the object to manipulate, make the manipulations in the different branches, return after the branches.
  • have each branch return at its end I have a slight preference for the first, but its not always practical, nor is it always more readable. As such, disagree with making it a general principle / practice - that's for strong preferences, particularly where the lazy alternatives pose distinct downsides.

looking at the linked JB example, agree the LHS is code smell, but again: the stopifnot(...) idiom would be a much clearer solution than the RHS. That idiom telegraphs: "Here are all the things we can check up front and immediately exit, for reasons".

However, sometimes there's actually some branching logic (e.g. when embracing some polymorphism in function arguments, though yes that too can go overboard), and you can't properly check exit conditions until you've made some progress on evaluating arguments.

In terms of actual code, for check_date, I'd prefer something akin to:

with(obs, stopifnot(
  "Both `reference_date` and `report_date` must be columns." =
     !(is.null(reference_date) && is.null(report_date)),
  "`reference_date` must be a column." = !is.null(reference_date),
  "`report_date` must be a column." = !is.null(report_date),
 local = ... # not quite sure here; seems like this is an internal function used for checks - parent.frame?
))

probably before the internal copy steps as well?

From @Bisaloo

I agree that stopifnot() is easier to understand for simple checking. I can do the change for the case at hand.

From @TimTaylor

Unsolicited, 😬, but hopefully useful comments:

stopifnot() definitely looks nice. stopifnot() is restrictive in that you cannot suppress the calling function as with stop() nor can you return condition objects with custom classes. Both of these can be useful with internal check functions. There is a slight overhead to successful stopifnot() calls that is avoided with stop() (only relevant if the function is called a lot and normally can be avoided by better structuring of code in this case anyway). How much the last two points matter depends on the use case but worth bearing in mind.

For a stop() approach I've used something similar to the following for internal check functions:

.stopf <- function(fmt, ..., .use_call = TRUE, .call = sys.call(-1L), .class = NULL) {
    .call <- if (isTRUE(.use_call)) .call[1L] else NULL
    msg <- sprintf(fmt, ...)
    err <- errorCondition(msg, class = .class, call = .call)
    stop(err)
}

.assert_not_missing <- function(x, arg, call) {
    if (missing(x))
        .stopf("argument `%s` is missing, with no default.", arg, .call = call)
}


.positive_int <- function(x, arg = deparse(substitute(x)), call = sys.call(-1L)) {
    .assert_not_missing(x, arg = arg, call = call)
    if (!is.integer(x) || x <= 0L)
        .stopf("`%s` must be a positive integer.", arg, .call = call)
}


external <- function(bob) {
    .positive_int(bob)
}

# will work
external(1L)

# will error
external(1)
#> Error in external(): `bob` must be a positive integer.
external(0L)
#> Error in external(): `bob` must be a positive integer.
external()
#> Error in external(): argument `bob` is missing, with no default.

Created on 2023-04-12 with reprex v2.0.2

The first two functions are the boiler plate stuff with the .positive_int() being akin to your check_dates(). It is very easy to mess up which frame to evaluate things in so you do do need to take care :sweat_smile: . {rlang} does have a lot for this sort of thing should you wish to go that route.

Created on 2023-04-12 with reprex v2.0.2

seabbs avatar Apr 17 '23 11:04 seabbs

Quite a lot of this may now be being handled by coerce_dt() (introduced in #239) but there may be more to do here.

seabbs avatar Apr 24 '23 11:04 seabbs

A bit more, I think - coerce_dt just handles data-like-inputs (which is indeed a fair chunk), but nothing else (nor do I think it should - would muddy up its nature).

pearsonca avatar Apr 24 '23 12:04 pearsonca

Agree we shouldn't overload it any more.

seabbs avatar Apr 24 '23 13:04 seabbs

Some good resources and thoughts here for improving input checks: https://blog.djnavarro.net/posts/2023-08-08_being-assertive/

seabbs avatar Aug 08 '23 12:08 seabbs