argh icon indicating copy to clipboard operation
argh copied to clipboard

A systematic approach to find obscure errors ?

Open moodymudskipper opened this issue 6 years ago • 0 comments

First, great idea!

I hope you won't mind if I share here some thoughts of mine.

I think we can find programatically which errors confuse the users, we need a list of the errors and then we can count the google results, optionally restricted to stack overflow, the R mailing list archive, the R Studio community forum, or github issues, to assign a relevance metric to each one.

To get a list of errors we need a list of popular packages and then we can parse the code and look for them, we can't find easily the errors coded like msg <- "ooops"; stop(msg) but these are exceptions in practice.

find popular packages

We can use @andrie 's package and take the top 40, install those that aren't installed,

# devtools::install_github("andrie/pagerank")
library(tidyverse)
# takes a couple minutes
pr <- pagerank::compute_pagerank(mirror = "http://cran.revolutionanalytics.com")
n <- 40
pkgs <- c(
  "stats", "graphics", "grDevices", "utils", "datasets",  "methods", "base",
  names(head(pr,n)))
#  gsub("\n.*$","",tidyverse_packages()),)

install.packages(setdiff(pkgs, installed.packages()))

parse code and tidy

Parse what's in stop, warning, abort, and warn calls and reshape.

funs <- pkgs %>%
  set_names() %>%
  map_dfr(~list(name=ls(getNamespace(.), all.names = TRUE)),.id="pkg") %>%
  mutate(fun = map2(pkg, name, ~get(.y,envir=getNamespace(.x)))) %>%
  filter(map_lgl(fun, is_function)) %>%
  mutate(
    body = map_chr(
    fun, . %>% body() %>% deparse() %>% paste(collapse="\n")),
    stop = str_match_all(body,"stop\\((.*?)[\\),]"),
    warning = str_match_all(body,"warning\\((.*?)[\\),]"),
    abort = str_match_all(body,"abort\\((.*?)[\\),]"),
    warn = str_match_all(body,"warn\\((.*?)[\\),]")) %>%
  select(-body) %>%
  gather(type,msg,stop:warn) %>%
  filter(map_lgl(msg, ~nrow(.) > 0)) %>%
  mutate(msg = map(msg,~.[,2])) %>%
  unnest(msg) %>%
  add_count(msg,name = "n_occurences")

funs

# # A tibble: 3,657 x 5
#    pkg   name            type  msg                                                                                           n_occurences
#    <chr> <chr>           <chr> <chr>                                                                                                <int>
#  1 stats .asSparse       stop  "gettextf(\"%s needs package 'Matrix' correctly installed\""                                             3
#  2 stats .cbind.ts       stop  "\"no time series supplied\""                                                                            1
#  3 stats .cbind.ts       stop  "\"not all series have the same frequency\""                                                             1
#  4 stats .cbind.ts       stop  "\"non-time series not of the correct length\""                                                          1
#  5 stats .checkMFClasses stop  "gettextf(\"variable '%s' was fitted with type \\\"%s\\\" but type \\\"%s\\\" was supplied\""            1
#  6 stats .checkMFClasses stop  "gettextf(\"variables %s were specified with different types from the fit\""                             1
#  7 stats .Diag           stop  "gettextf(\"%s needs package 'Matrix' correctly installed\""                                             3
#  8 stats .preformat.ts   stop  "\"series is corrupt"                                                                                    2
#  9 stats [<-.ts          stop  "\"only replacement of elements is allowed\""                                                            1
# 10 stats acf             stop  "\"'x' must be numeric\""                                                                               15
# # ... with 3,647 more rows

find most searched errors

They're likely to be the most confusing and the most worthy of your package's suggestions.

I haven't done this step :).

moodymudskipper avatar Apr 12 '19 15:04 moodymudskipper