epicontacts icon indicating copy to clipboard operation
epicontacts copied to clipboard

Code review of epicontacts

Open zkamvar opened this issue 5 years ago • 7 comments

This is based on the rOpenSci code reviews: https://devguide.ropensci.org/

@zkamvar and @finlaycampbell are assigned.

zkamvar avatar Oct 28 '19 15:10 zkamvar

Sorry, the review process starts here: https://devguide.ropensci.org/reviewerguide.html

zkamvar avatar Oct 28 '19 15:10 zkamvar

I was thinking it might be best to conduct the code review on the tree branch? It's up to date with master and largely consists of the new functions vis_ttree, vis_ggplot and a bunch of helper functions. The big changes to the existing code base are in vis_epicontacts (to ensure consistency with the other plotting functions) and other plotting functions - I can look over all of these. All of the remaining functionality (clustering, etc.) aren't changed. Thoughts?

finlaycampbell avatar Oct 29 '19 10:10 finlaycampbell

I don't think that would be a good idea because I think ttree needs its own thorough code review in addition to the current core functionalities of the master branch. It's better to nail down the core functionalities first so that we aren't chasing down bugs while considering new features.

zkamvar avatar Oct 29 '19 10:10 zkamvar

OK fair enough let's stick with master for now then!

finlaycampbell avatar Oct 29 '19 11:10 finlaycampbell

Here is the results of goodpractice:

> goodpractice::gp()                                                                                                                                                                            
Registered S3 method overwritten by 'httr':
  method                 from
  as.character.form_file crul
Preparing: covr
Preparing: cyclocomp
✔  checking for file ‘/tmp/RtmpHcSv9P/remotes442d27d9deba/epicontacts/DESCRIPTION’ ...
─  preparing ‘epicontacts’:
✔  checking DESCRIPTION meta-information
✔  checking vignette meta-information
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘epicontacts_1.1.1.tar.gz’
   
* installing *source* package ‘epicontacts’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (epicontacts)
Adding ‘epicontacts_1.1.1_R_x86_64-pc-linux-gnu.tar.gz’ to the cache
Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck
── GP epicontacts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 86% of code lines are covered by test cases.

    R/colors.R:64:NA
    R/colors.R:65:NA
    R/colors.R:66:NA
    R/colors.R:67:NA
    R/internals.R:36:NA
    ... and 87 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A build date will be added to the package when you perform `R CMD build` on
    it.
  ✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used it and it is easier to read your code for them if you use
    '<-'.

    R/graph3D.R:111:23
    R/subset_clusters_by_size.R:68:16
    R/subset_clusters_by_size.R:69:16
    R/subset_clusters_by_size.R:75:16
    R/subset_clusters_by_size.R:81:16

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than
    80 characters

    R/get_clusters.R:3:1
    R/get_clusters.R:14:1
    R/get_clusters.R:15:1
    R/get_clusters.R:18:1
    R/get_clusters.R:24:1
    ... and 31 more lines

  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

    R/graph3D.R:146:17

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right
    hand side is zero. Use seq_len() or seq_along() instead.

    tests/testthat/test_get_clusters.R:115:30
    tests/testthat/test_get_clusters.R:136:30

  ✖ fix this R CMD check NOTE: Namespace in Imports field not imported from: ‘colorspace’ All declared Imports should be used.

zkamvar avatar Oct 29 '19 13:10 zkamvar

Package Review

Documentation

The package includes all the following forms of documentation:

  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • [x] Installation instructions: for the development version of package and any non-standard dependencies in README
  • [x] Vignette(s) demonstrating major functionality that runs successfully locally
  • [x] Function Documentation: for all exported functions in R help
  • [x] Examples for all exported functions in R Help that run successfully locally
  • [ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Review Comments

README

  • Has no statement of need for the package
  • Only one of the three vignettes listed in the README works
  • Maintainer email is only partial

Documentation

  • get_pairwise() does not describe what it returns. Thus, it's not clear how the serial interval is derived in the first example unless you look at the code.
  • [.epicontacts: x[k = FALSE, j = FALSE] removes everything but the ids in the linelist... I don't think this is expected
  • thin() documentation doesn't explicitly say what it returns. Additionally, there are no examples.
  • summary.epicontacts() has documentation, but no examples and it doesn't describe what is summarized
  • as.igraph() doesn't work if igraph is not loaded.
  • get_clusters() this needs to describe how the clusters are obtained.

Tests

I see the ebola_sim linelist used quite a lot. It would be good to define a small test data set with some known pathologies (contacts without corresponding linelist components, etc) that we can use for testing known quantities. We also should get rid of the helper that auto-loads outbreaks. A lot of the tests additionally seem to repeat a lot of the data construction steps (I believe because they are set up as part of skip_on_cran(). There is no need to forbid all tests on CRAN. Some of them are okay to have.

  • as.igraph.epicontacts(): OK
  • colors: OK
  • clusters: the first test of as.igraph.epicontacts is unnecessary. I am unsure of what the second test is doing (what is a net node?). Neither of the get_clusters() tests actually test that the clusters are cromulent.
  • get_degree(): first test should read "both equals sum of in and out".
  • get_id(): I think these first tests need some explanation as to what is going on with the validation.
  • get_pairwise(): The first test needs to be updated to testthat standards. The "different types of attributes" tests need clarification on what they are testing for
  • graph3D(): This takes a long time to run. Use a smaller data set
  • handling: This needs more stress tests and more descriptive tests. A custom test handler might be useful here.
  • make_epicontacts: The first test needs to have more description to confirm that the process actually works
  • plot.epicontacts(): First test has the expectations commented out. Testing plots are always a bit of a PITA. This could benefit from a smaller dataset. Additionally, you can test against the internal workings of the resulting plot. The error expectation needs a specific error message to test against.
  • print.epicontacts(): This is effectively useless. It's not tested on travis or CRAN.
  • subset(): last bit needs to move away from the stored objects. The last test suite should be split up into smaller chunks because it goes all over the place.
  • subset_clusters_by_x: These need to be more specific and targeted. The clustering bit is the only thing that may be uncertain, so we need to make sure we have a data set that can actually test these.
  • summary(): Needs to get rid of the stored object comparison and hard-code the expected values.
  • thin(): OK
  • vis_epicontacts(): see plot.epicontacts()

zkamvar avatar Oct 29 '19 16:10 zkamvar

I've implemented some changes on a code_review branch (see ea484306d8d582f2a11b6c65f9c256c10bca9e47)

Code

as.igraph.epicontacts

  • dplyr dependency?
  • merge(x$linelist, all_ids, by = 'id', all = TRUE)

colors

  • OK

get_clusters

  • dplyr dependency?

get_degree

  • No return value
  • Doesn’t work correctly for only_linelist = FALSE
  • change ‘contacts’ to ‘all’ on L48
  • vapply counting approach is very slow – perhaps switch to table

get_id

  • OK

get_pairwise

  • No return value
  • Description does not specify default behaviours for different classes
  • Vector is named in some cases (attribute is numeric) and not in others (attribute is character)
  • Fixed in commit
  • Could perhaps add a rev argument that calculates to → from
  • @param attribute should specify that you can add a character or the index of the column

graph3D

  • dplyr dependency
  • Changed to merge in commit

thin

  • Typo in warning message (contacts → contacts)

handling

  • No return value specified
  • Throws error when subsetting by date

internals

  • OK

make_epicontacts

  • IDs of class Date should not be allowed
  • Should NA IDs be allowed?

plot

  • The default qualitative color palette doesn’t really make sense for continuous node attributes, not a massive issue though

print.epicontacts

  • Change dplyr::tbl_df → tibble::as_tibble for printing?

print.summary_epicontacts

  • OK

subset_clusters_by_id

  • Description doesn’t node that linelist is also subsetted

subset_clusters_by_size

  • OK

subset.epicontacts

  • No return value
  • Maybe should be able to set a range for numeric attributes

summary.epicontacts

  • OK

thin

  • No return value
  • The example doesn’t seem to be using thin?

vis_epicontacts

  • Looks good for now but will have to looked over again when merging ttree

Tests

test_as.igraph.epicontacts

  • OK

test_colors

  • OK

test_get_clusters

  • “construction of net nodes” doesn’t really test for much AFAICT
  • I’ve added a test that manually constructs clusters and tests for them

test_get_degree

  • Added a name test for only_linelist = FALSE
  • Added a manual test for degree numbers calculation (this method is actually much faster than the current implementation – maybe should be swapped)

test_get_ID

  • OK

test_get_pairwise

  • No test on values
  • Added in commit

test_handling

  • Test for erroneous subsetting by Date

test_make_epicontacts

  • OK

test_plot.R

  • Testing against references not very useful

test_subset_clusters_by_ids

  • Added manual test with known clusters

test_subset_clusters_by_size

  • OK

test_subset.epicontacts

  • OK

test_summary.epicontacts

  • OK

test_thin

  • OK

test_vis_epicontacts

  • Testing against references not very useful
  • More tests trying to break the underlying machinery could be useful edge_color, edge_label, edge_width, node_color against numeric, character, factor, date attributes with NAs

finlaycampbell avatar Nov 01 '19 15:11 finlaycampbell