fuzzyjoin
fuzzyjoin copied to clipboard
suggestion: more informative error message
Thanks for fuzzyjoin
!
I recently spent several minutes troubleshooting an odd (to me) error (and its associated stack trace):
Error: All columns in a tibble must be vectors.
x Column `col` is NULL.
x
1. \-fuzzyjoin::fuzzy_left_join(...)
2. \-fuzzyjoin::fuzzy_join(x, y, by, match_fun, mode = "left", ...)
3. \-base::lapply(...)
4. \-fuzzyjoin:::FUN(X[[i]], ...)
5. \-`%>%`(...)
6. +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
7. \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
8. \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
9. \-fuzzyjoin:::`_fseq`(`_lhs`)
10. \-magrittr::freduce(value, `_function_list`)
11. \-function_list[[i]](value)
12. +-dplyr::group_by(., col)
13. \-dplyr:::group_by.data.frame(., col)
14. \-dplyr::grouped_df(groups$data, groups$group_names, .drop)
15. \-dplyr:::compute_groups(data, vars, drop = drop)
16. +-tibble::as_tibble(data)
17. \-tibble:::as_tibble.data.frame(data)
18. \-tibble:::lst_to_tibble(unclass(x), .rows, .name_repair)
19. \-tibble:::check_valid_cols(x)
20. \-rlang::cnd_signal(...)
21. \-rlang:::signal_abort(cnd)
22. \-base::signalCondition(cnd)
Long-story-short, I had swapped one of the by=
names/values, so internally, y
was not finding the associated field name. While that mistake is wholly mine, it would be really useful if the error was detected a little earlier and announced a little more clearly.
Perhaps something like:
if (!all(by$x %in% names(x)) stop("columns not found in 'x': ", setdiff(by$x, names(x))) ## added 1
if (!all(by$y %in% names(y)) stop("columns not found in 'y': ", setdiff(by$y, names(y))) ## added 2
matches <- dplyr::bind_rows(lapply(seq_along(by$x), function(i) {
col_x <- x[[by$x[i]]]
col_y <- y[[by$y[i]]]
if (is.null(col_x) || is.null(col_y)) stop("something else") ## added 3
...
In my case, additional lines "1" and "2" would have been immediately obvious what I did wrong. The "added 3" if
statement is mostly a catchall so that the user is given a fuzzyjoin
-error instead of an error from bind_rows
. I don't know what the best error message would be there...
Thanks!