fuzzyjoin icon indicating copy to clipboard operation
fuzzyjoin copied to clipboard

suggestion: more informative error message

Open r2evans opened this issue 3 years ago • 0 comments

Thanks for fuzzyjoin!

I recently spent several minutes troubleshooting an odd (to me) error (and its associated stack trace):

Error: All columns in a tibble must be vectors.
x Column `col` is NULL.
     x
  1. \-fuzzyjoin::fuzzy_left_join(...)
  2.   \-fuzzyjoin::fuzzy_join(x, y, by, match_fun, mode = "left", ...)
  3.     \-base::lapply(...)
  4.       \-fuzzyjoin:::FUN(X[[i]], ...)
  5.         \-`%>%`(...)
  6.           +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  7.           \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  8.             \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
  9.               \-fuzzyjoin:::`_fseq`(`_lhs`)
 10.                 \-magrittr::freduce(value, `_function_list`)
 11.                   \-function_list[[i]](value)
 12.                     +-dplyr::group_by(., col)
 13.                     \-dplyr:::group_by.data.frame(., col)
 14.                       \-dplyr::grouped_df(groups$data, groups$group_names, .drop)
 15.                         \-dplyr:::compute_groups(data, vars, drop = drop)
 16.                           +-tibble::as_tibble(data)
 17.                           \-tibble:::as_tibble.data.frame(data)
 18.                             \-tibble:::lst_to_tibble(unclass(x), .rows, .name_repair)
 19.                               \-tibble:::check_valid_cols(x)
 20.                                 \-rlang::cnd_signal(...)
 21.                                   \-rlang:::signal_abort(cnd)
 22.                                     \-base::signalCondition(cnd)

Long-story-short, I had swapped one of the by= names/values, so internally, y was not finding the associated field name. While that mistake is wholly mine, it would be really useful if the error was detected a little earlier and announced a little more clearly.

Perhaps something like:

    if (!all(by$x %in% names(x)) stop("columns not found in 'x': ", setdiff(by$x, names(x)))  ## added 1
    if (!all(by$y %in% names(y)) stop("columns not found in 'y': ", setdiff(by$y, names(y)))  ## added 2
    matches <- dplyr::bind_rows(lapply(seq_along(by$x), function(i) {
      col_x <- x[[by$x[i]]]
      col_y <- y[[by$y[i]]]
      if (is.null(col_x) || is.null(col_y)) stop("something else")                            ## added 3
      ...

In my case, additional lines "1" and "2" would have been immediately obvious what I did wrong. The "added 3" if statement is mostly a catchall so that the user is given a fuzzyjoin-error instead of an error from bind_rows. I don't know what the best error message would be there...

Thanks!

r2evans avatar Oct 09 '20 17:10 r2evans