argh
argh copied to clipboard
A systematic approach to find obscure errors ?
First, great idea!
I hope you won't mind if I share here some thoughts of mine.
I think we can find programatically which errors confuse the users, we need a list of the errors and then we can count the google results, optionally restricted to stack overflow, the R mailing list archive, the R Studio community forum, or github issues, to assign a relevance metric to each one.
To get a list of errors we need a list of popular packages and then we can parse the code and look for them, we can't find easily the errors coded like msg <- "ooops"; stop(msg) but these are exceptions in practice.
find popular packages
We can use @andrie 's package and take the top 40, install those that aren't installed,
# devtools::install_github("andrie/pagerank")
library(tidyverse)
# takes a couple minutes
pr <- pagerank::compute_pagerank(mirror = "http://cran.revolutionanalytics.com")
n <- 40
pkgs <- c(
"stats", "graphics", "grDevices", "utils", "datasets", "methods", "base",
names(head(pr,n)))
# gsub("\n.*$","",tidyverse_packages()),)
install.packages(setdiff(pkgs, installed.packages()))
parse code and tidy
Parse what's in stop, warning, abort, and warn calls and reshape.
funs <- pkgs %>%
set_names() %>%
map_dfr(~list(name=ls(getNamespace(.), all.names = TRUE)),.id="pkg") %>%
mutate(fun = map2(pkg, name, ~get(.y,envir=getNamespace(.x)))) %>%
filter(map_lgl(fun, is_function)) %>%
mutate(
body = map_chr(
fun, . %>% body() %>% deparse() %>% paste(collapse="\n")),
stop = str_match_all(body,"stop\\((.*?)[\\),]"),
warning = str_match_all(body,"warning\\((.*?)[\\),]"),
abort = str_match_all(body,"abort\\((.*?)[\\),]"),
warn = str_match_all(body,"warn\\((.*?)[\\),]")) %>%
select(-body) %>%
gather(type,msg,stop:warn) %>%
filter(map_lgl(msg, ~nrow(.) > 0)) %>%
mutate(msg = map(msg,~.[,2])) %>%
unnest(msg) %>%
add_count(msg,name = "n_occurences")
funs
# # A tibble: 3,657 x 5
# pkg name type msg n_occurences
# <chr> <chr> <chr> <chr> <int>
# 1 stats .asSparse stop "gettextf(\"%s needs package 'Matrix' correctly installed\"" 3
# 2 stats .cbind.ts stop "\"no time series supplied\"" 1
# 3 stats .cbind.ts stop "\"not all series have the same frequency\"" 1
# 4 stats .cbind.ts stop "\"non-time series not of the correct length\"" 1
# 5 stats .checkMFClasses stop "gettextf(\"variable '%s' was fitted with type \\\"%s\\\" but type \\\"%s\\\" was supplied\"" 1
# 6 stats .checkMFClasses stop "gettextf(\"variables %s were specified with different types from the fit\"" 1
# 7 stats .Diag stop "gettextf(\"%s needs package 'Matrix' correctly installed\"" 3
# 8 stats .preformat.ts stop "\"series is corrupt" 2
# 9 stats [<-.ts stop "\"only replacement of elements is allowed\"" 1
# 10 stats acf stop "\"'x' must be numeric\"" 15
# # ... with 3,647 more rows
find most searched errors
They're likely to be the most confusing and the most worthy of your package's suggestions.
I haven't done this step :).