McMasterPandemic icon indicating copy to clipboard operation
McMasterPandemic copied to clipboard

minimize use of tidyverse

Open bbolker opened this issue 4 years ago • 10 comments

Two reasons to prefer not to use tidyverse internally where avoidable (fine in examples, vignettes, etc.):

  • tidyverse evolves rapidly, forward compatibility is hard to guarantee
  • tidyverse carries a boatload of dependencies forward

If there is a real need for tidyverse, we can use it, but I have a moderately strong preference for avoiding it.

@papsti ... ?

bbolker avatar May 05 '21 19:05 bbolker

I've been avoiding tidyverse tools in functions that I know get called a lot (e.g. any of the simulation tools), because I (perhaps erroneously?) assume the have more overhead and are thus less efficient. I definitely use them when setting up simulation inputs (e.g. expanding parameters with age), when reshaping simulation results (e.g. condensing over vaccination status), and of course for plotting.

Your points about forward compatibility and dependencies are good ones, and I guess I should maybe use the tidyverse even more sparingly... Here's an idea: I'll keep developing tools on my ip_devel branch without being any more careful about using the tidyverse than I already am (since being too careful will slow me down in the development phase) and then we should have a meeting to review my code ahead of merging ip_devel branch into master. We can then flag anything that should be untidyversed and I can make those changes before the merge. Sound good?

papsti avatar May 05 '21 20:05 papsti

Seems fine.
The reason I noticed this in the first place is that your package failed the GitHub actions check (see here) because you use functions from stringr and forcats but these are not listed in the DESCRIPTION file. More generally, I believe there is a NOTE in R CMD check if a package imports too many other packages - we were previously running up against this limit ...

bbolker avatar May 05 '21 20:05 bbolker

😬

papsti avatar May 05 '21 20:05 papsti

what if we... wait for it... removed ip_devel being checked via github actions for now...

papsti avatar May 05 '21 20:05 papsti

Adding forcats and stringr to DESCRIPTION should be easy, I can do it if you like

bbolker avatar May 05 '21 20:05 bbolker

Sure, thanks!

papsti avatar May 05 '21 20:05 papsti

Done. There are still testing problems (most critical is that the check_contact_rate_setting() example fails. You can avoid further GitHub actions testing by including the string "[skip ci]" to the end of your commit message (this should be documented in README, if it isn't already)

bbolker avatar May 05 '21 21:05 bbolker

I realize that we do use tidyverse in other places, e.g. pivot.pansim() method. It's handy. Moving forward need to consider:

  • whether we can subdivide into "core" (no tidyverse) and "periphery" (tidyverse OK)
  • how far we could get using the poorman package ("A replication of key functionality from 'dplyr' and the wider 'tidyverse' using only 'base'.")

bbolker avatar May 12 '21 17:05 bbolker

See also https://github.com/hrbrmstr/freebase

bbolker avatar May 13 '22 20:05 bbolker

The tidyverse stuff that is most difficult to replace in my opinion is pivot_longer and the join family.

The freebase package that @bbolker points out has a base R version of gather, which is the predecessor of pivot_longer so this might work.

I couldn't find any joins, but I guess we should just use merge.

stevencarlislewalker avatar May 17 '22 13:05 stevencarlislewalker