incidence
incidence copied to clipboard
add tidy_incidence and glance_incidence for broom like tibble outputs
Currently get_info
generate vector class outputs, isolated from their metadata.
For tidyverse
users, this output could be cleaner, more explicit and easy to manage if the output is a dataframe (or tibble).
Here I share some broom-like implementations as tidy_incidence
and glance_incidence
in a currently personal package here.
These functions prove to be easy to manage to any level of stratification, if required, when using purrr with map
family of functions.
Feel free to check them and evaluate their addition to the core package!
# packages ----------------------------------------------------------------
if(!require("devtools")) install.packages("devtools")
if(!require("avallecam")) devtools::install_github("avallecam/avallecam") #improvements
library(tidyverse) #magrittr and purrr packages
library(lubridate) #ymd
library(outbreaks) #sample data
library(incidence) #core functions
# example outbreak --------------------------------------------------------
dat <- ebola_sim$linelist$date_of_onset
i.7 <- incidence(dat, interval=7)
# plot(i.7)
f1 <- fit(i.7[1:20])
f2 <- fit_optim_split(i.7)
# broom like functions ----------------------------------------------------
# tidy
f1 %>% tidy_incidence()
#> # A tibble: 2 x 5
#> mark parameter estimate conf_lower conf_upper
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 r 0.0318 0.0260 0.0376
#> 2 1 doubling 21.8 18.5 26.7
f2 %>% pluck("fit") %>% tidy_incidence()
#> # A tibble: 4 x 5
#> mark parameter estimate conf_lower conf_upper
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 before r 0.0298 0.0261 0.0336
#> 2 after r -0.0102 -0.0110 -0.00930
#> 3 before doubling 23.2 20.7 26.6
#> 4 after halving 68.2 62.9 74.5
# glance
f1 %>% glance_incidence()
#> # A tibble: 1 x 11
#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.880 0.874 0.498 133. 9.81e-10 2 -13.4 32.8 35.7
#> # ... with 2 more variables: deviance <dbl>, df.residual <int>
f2 %>% pluck("fit") %>% glance_incidence()
#> # A tibble: 2 x 12
#> names r.squared adj.r.squared sigma statistic p.value df logLik AIC
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 befo~ 0.922 0.919 0.455 273. 2.95e-14 2 -14.8 35.5
#> 2 after 0.951 0.949 0.155 578. 3.72e-21 2 15.4 -24.8
#> # ... with 3 more variables: BIC <dbl>, deviance <dbl>, df.residual <int>
# using purrr -------------------------------------------------------------
# using purrr::map family function allows easy stratification
# for gender and could be extrapolated to administrative levels
# in country level analysis
incidence_purrr <- ebola_sim$linelist %>%
as_tibble() %>%
#filter observations explicitly before incidence()
filter(date_of_onset<lubridate::ymd(20141007)) %>%
#stratify by any group of covariates
group_by(gender) %>%
nest() %>%
mutate(incidence_strata=map(.x = data,
.f = ~incidence(.x %>% pull(date_of_onset),
interval=7))) %>%
mutate(strata_fit=map(.x = incidence_strata,
.f = fit)) %>%
mutate(strata_fit_tidy=map(.x = strata_fit,
.f = tidy_incidence)) %>%
mutate(strata_fit_glance=map(.x = strata_fit,
.f = glance_incidence))
# keep only the tibbles
incidence_purrr_tibble <- incidence_purrr %>%
select(-data,-incidence_strata,-strata_fit)
# tidy_incidence output
incidence_purrr_tibble %>%
unnest(cols = c(strata_fit_tidy))
#> # A tibble: 4 x 7
#> # Groups: gender [2]
#> gender mark parameter estimate conf_lower conf_upper strata_fit_glance
#> <fct> <chr> <chr> <dbl> <dbl> <dbl> <list>
#> 1 f 1 r 0.0228 0.0189 0.0267 <tibble [1 x 11]>
#> 2 f 1 doubling 30.3 25.9 36.6 <tibble [1 x 11]>
#> 3 m 1 r 0.0241 0.0188 0.0294 <tibble [1 x 11]>
#> 4 m 1 doubling 28.7 23.6 36.8 <tibble [1 x 11]>
# glance_incidence output
incidence_purrr_tibble %>%
unnest(cols = c(strata_fit_glance))
#> # A tibble: 2 x 13
#> # Groups: gender [2]
#> gender strata_fit_tidy r.squared adj.r.squared sigma statistic p.value
#> <fct> <list> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 f <tibble [2 x 5~ 0.859 0.853 0.510 146. 1.05e-11
#> 2 m <tibble [2 x 5~ 0.795 0.786 0.657 89.0 2.26e- 9
#> # ... with 6 more variables: df <int>, logLik <dbl>, AIC <dbl>, BIC <dbl>,
#> # deviance <dbl>, df.residual <int>
Created on 2020-06-12 by the reprex package (v0.3.0)
PD.- this is a RECON workshop alumni, thanks for all you work!
Thanks, this looks like a nice addition, and I like the worked example. Will try to find someone to review the PR asap
Hello @avallecam! Thank you for submitting a PR. Indeed this looks like a nice addition, though I will point out that it imports a lot of the tidyverse, which will force the packages that depend on incidence to also import these packages. While I like the tidyverse, I don't think incidence is the appropriate package for these features because the tidyverse tends to be relentless in breaking changes.
What I might suggest is to create a separate micro package that provides this interface to the incidence object. This way, incidence can stay (relatively) dependency-light and others can use these features if they want to in their analyses.
What do you think?
Best, Zhian
Hi @zkamvar, I definitely get your point!
Well, as I mentioned in the PR, the functions are hosted in a personal package. However, the best external package would be broom itself. They specify some suggestions to create new tidiers to new model objects here.
Let me know your opinions about this alternative.
Hi there,
@tjtnew is currently working on a revamp of incidence (currently called incidence2) which will use the tidyverse more extensively. We are also planning on re-implementing incidence::fit
etc. in a small separate package using a more consistent interface, more choice of underlying models, and tidier outputs. @avallecam it looks like your contribution would fit nicely in there. If you are fine with it, maybe we can wait until this comes into existence?
Hi there, @tjtnew is currently working on a revamp of incidence (currently called incidence2) which will use the tidyverse more extensively. We are also planning on re-implementing
incidence::fit
etc. in a small separate package using a more consistent interface, more choice of underlying models, and tidier outputs. @avallecam it looks like your contribution would fit nicely in there. If you are fine with it, maybe we can wait until this comes into existence?
totally ok with this too @thibautjombart . so, I will keep tidiers in the current package, for now. @zkamvar @tjtnew let me know if this contribution would still be preferred to be outside the new package. broom requires some details to fulfill but the response is really fast.
Hi @avallecam, we've not yet started on the fitting package but will be in touch when we start to reimplement. FI - the development version of the stripped back "tidy" incidence2 package is hosted at https://github.com/reconhub/incidence2. You might like to have a poke around. The new incidence
class is now a subclass of a tibble so should work well with dplyr verbs. Let us know if you have any feedback.
Cheers