incidence icon indicating copy to clipboard operation
incidence copied to clipboard

add tidy_incidence and glance_incidence for broom like tibble outputs

Open avallecam opened this issue 4 years ago • 6 comments

Currently get_info generate vector class outputs, isolated from their metadata.

For tidyverse users, this output could be cleaner, more explicit and easy to manage if the output is a dataframe (or tibble).

Here I share some broom-like implementations as tidy_incidence and glance_incidence in a currently personal package here.

These functions prove to be easy to manage to any level of stratification, if required, when using purrr with map family of functions.

Feel free to check them and evaluate their addition to the core package!

# packages ----------------------------------------------------------------

if(!require("devtools")) install.packages("devtools")
if(!require("avallecam")) devtools::install_github("avallecam/avallecam") #improvements

library(tidyverse) #magrittr and purrr packages
library(lubridate) #ymd
library(outbreaks) #sample data
library(incidence) #core functions

# example outbreak --------------------------------------------------------

dat <- ebola_sim$linelist$date_of_onset
i.7 <- incidence(dat, interval=7)
# plot(i.7)
f1 <- fit(i.7[1:20])
f2 <- fit_optim_split(i.7)

# broom like functions ----------------------------------------------------

# tidy
f1 %>% tidy_incidence()
#> # A tibble: 2 x 5
#>   mark  parameter estimate conf_lower conf_upper
#>   <chr> <chr>        <dbl>      <dbl>      <dbl>
#> 1 1     r           0.0318     0.0260     0.0376
#> 2 1     doubling   21.8       18.5       26.7
f2 %>% pluck("fit") %>% tidy_incidence()
#> # A tibble: 4 x 5
#>   mark   parameter estimate conf_lower conf_upper
#>   <chr>  <chr>        <dbl>      <dbl>      <dbl>
#> 1 before r           0.0298     0.0261    0.0336 
#> 2 after  r          -0.0102    -0.0110   -0.00930
#> 3 before doubling   23.2       20.7      26.6    
#> 4 after  halving    68.2       62.9      74.5

# glance
f1 %>% glance_incidence()
#> # A tibble: 1 x 11
#>   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
#>       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl> <dbl>
#> 1     0.880         0.874 0.498      133. 9.81e-10     2  -13.4  32.8  35.7
#> # ... with 2 more variables: deviance <dbl>, df.residual <int>
f2 %>% pluck("fit") %>% glance_incidence()
#> # A tibble: 2 x 12
#>   names r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC
#>   <chr>     <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl>
#> 1 befo~     0.922         0.919 0.455      273. 2.95e-14     2  -14.8  35.5
#> 2 after     0.951         0.949 0.155      578. 3.72e-21     2   15.4 -24.8
#> # ... with 3 more variables: BIC <dbl>, deviance <dbl>, df.residual <int>


# using purrr -------------------------------------------------------------

# using purrr::map family function allows easy stratification
# for gender and could be extrapolated to administrative levels
# in country level analysis

incidence_purrr <- ebola_sim$linelist %>% 
  as_tibble() %>% 
  #filter observations explicitly before incidence()
  filter(date_of_onset<lubridate::ymd(20141007)) %>% 
  #stratify by any group of covariates
  group_by(gender) %>% 
  nest() %>% 
  mutate(incidence_strata=map(.x = data,
                              .f = ~incidence(.x %>% pull(date_of_onset),
                                              interval=7))) %>% 
  mutate(strata_fit=map(.x = incidence_strata,
                            .f = fit)) %>% 
  mutate(strata_fit_tidy=map(.x = strata_fit,
                                  .f = tidy_incidence)) %>% 
  mutate(strata_fit_glance=map(.x = strata_fit,
                                       .f = glance_incidence))

# keep only the tibbles
incidence_purrr_tibble <- incidence_purrr %>% 
  select(-data,-incidence_strata,-strata_fit)

# tidy_incidence output
incidence_purrr_tibble %>% 
  unnest(cols = c(strata_fit_tidy))
#> # A tibble: 4 x 7
#> # Groups:   gender [2]
#>   gender mark  parameter estimate conf_lower conf_upper strata_fit_glance
#>   <fct>  <chr> <chr>        <dbl>      <dbl>      <dbl> <list>           
#> 1 f      1     r           0.0228     0.0189     0.0267 <tibble [1 x 11]>
#> 2 f      1     doubling   30.3       25.9       36.6    <tibble [1 x 11]>
#> 3 m      1     r           0.0241     0.0188     0.0294 <tibble [1 x 11]>
#> 4 m      1     doubling   28.7       23.6       36.8    <tibble [1 x 11]>

# glance_incidence output
incidence_purrr_tibble %>% 
  unnest(cols = c(strata_fit_glance))
#> # A tibble: 2 x 13
#> # Groups:   gender [2]
#>   gender strata_fit_tidy r.squared adj.r.squared sigma statistic  p.value
#>   <fct>  <list>              <dbl>         <dbl> <dbl>     <dbl>    <dbl>
#> 1 f      <tibble [2 x 5~     0.859         0.853 0.510     146.  1.05e-11
#> 2 m      <tibble [2 x 5~     0.795         0.786 0.657      89.0 2.26e- 9
#> # ... with 6 more variables: df <int>, logLik <dbl>, AIC <dbl>, BIC <dbl>,
#> #   deviance <dbl>, df.residual <int>

Created on 2020-06-12 by the reprex package (v0.3.0)

PD.- this is a RECON workshop alumni, thanks for all you work!

avallecam avatar Jun 15 '20 02:06 avallecam

Thanks, this looks like a nice addition, and I like the worked example. Will try to find someone to review the PR asap

thibautjombart avatar Jun 15 '20 09:06 thibautjombart

Hello @avallecam! Thank you for submitting a PR. Indeed this looks like a nice addition, though I will point out that it imports a lot of the tidyverse, which will force the packages that depend on incidence to also import these packages. While I like the tidyverse, I don't think incidence is the appropriate package for these features because the tidyverse tends to be relentless in breaking changes.

What I might suggest is to create a separate micro package that provides this interface to the incidence object. This way, incidence can stay (relatively) dependency-light and others can use these features if they want to in their analyses.

What do you think?

Best, Zhian

zkamvar avatar Jun 20 '20 21:06 zkamvar

Hi @zkamvar, I definitely get your point!

Well, as I mentioned in the PR, the functions are hosted in a personal package. However, the best external package would be broom itself. They specify some suggestions to create new tidiers to new model objects here.

Let me know your opinions about this alternative.

avallecam avatar Jun 21 '20 00:06 avallecam

Hi there, @tjtnew is currently working on a revamp of incidence (currently called incidence2) which will use the tidyverse more extensively. We are also planning on re-implementing incidence::fit etc. in a small separate package using a more consistent interface, more choice of underlying models, and tidier outputs. @avallecam it looks like your contribution would fit nicely in there. If you are fine with it, maybe we can wait until this comes into existence?

thibautjombart avatar Jun 22 '20 10:06 thibautjombart

Hi there, @tjtnew is currently working on a revamp of incidence (currently called incidence2) which will use the tidyverse more extensively. We are also planning on re-implementing incidence::fit etc. in a small separate package using a more consistent interface, more choice of underlying models, and tidier outputs. @avallecam it looks like your contribution would fit nicely in there. If you are fine with it, maybe we can wait until this comes into existence?

totally ok with this too @thibautjombart . so, I will keep tidiers in the current package, for now. @zkamvar @tjtnew let me know if this contribution would still be preferred to be outside the new package. broom requires some details to fulfill but the response is really fast.

avallecam avatar Jun 22 '20 19:06 avallecam

Hi @avallecam, we've not yet started on the fitting package but will be in touch when we start to reimplement. FI - the development version of the stripped back "tidy" incidence2 package is hosted at https://github.com/reconhub/incidence2. You might like to have a poke around. The new incidence class is now a subclass of a tibble so should work well with dplyr verbs. Let us know if you have any feedback.

Cheers

TimTaylor avatar Jul 02 '20 14:07 TimTaylor