epicontacts
epicontacts copied to clipboard
Code review of epicontacts
This is based on the rOpenSci code reviews: https://devguide.ropensci.org/
@zkamvar and @finlaycampbell are assigned.
Sorry, the review process starts here: https://devguide.ropensci.org/reviewerguide.html
I was thinking it might be best to conduct the code review on the tree
branch? It's up to date with master
and largely consists of the new functions vis_ttree, vis_ggplot and a bunch of helper functions. The big changes to the existing code base are in vis_epicontacts (to ensure consistency with the other plotting functions) and other plotting functions - I can look over all of these. All of the remaining functionality (clustering, etc.) aren't changed. Thoughts?
I don't think that would be a good idea because I think ttree needs its own thorough code review in addition to the current core functionalities of the master branch. It's better to nail down the core functionalities first so that we aren't chasing down bugs while considering new features.
OK fair enough let's stick with master for now then!
Here is the results of goodpractice:
> goodpractice::gp()
Registered S3 method overwritten by 'httr':
method from
as.character.form_file crul
Preparing: covr
Preparing: cyclocomp
✔ checking for file ‘/tmp/RtmpHcSv9P/remotes442d27d9deba/epicontacts/DESCRIPTION’ ...
─ preparing ‘epicontacts’:
✔ checking DESCRIPTION meta-information
✔ checking vignette meta-information
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘epicontacts_1.1.1.tar.gz’
* installing *source* package ‘epicontacts’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (epicontacts)
Adding ‘epicontacts_1.1.1_R_x86_64-pc-linux-gnu.tar.gz’ to the cache
Preparing: description
Preparing: lintr
Preparing: namespace
Preparing: rcmdcheck
── GP epicontacts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
It is good practice to
✖ write unit tests for all functions, and all package code in general. 86% of code lines are covered by test cases.
R/colors.R:64:NA
R/colors.R:65:NA
R/colors.R:66:NA
R/colors.R:67:NA
R/internals.R:36:NA
... and 87 more lines
✖ omit "Date" in DESCRIPTION. It is not required and it gets invalid quite often. A build date will be added to the package when you perform `R CMD build` on
it.
✖ use '<-' for assignment instead of '='. '<-' is the standard, and R users and developers are used it and it is easier to read your code for them if you use
'<-'.
R/graph3D.R:111:23
R/subset_clusters_by_size.R:68:16
R/subset_clusters_by_size.R:69:16
R/subset_clusters_by_size.R:75:16
R/subset_clusters_by_size.R:81:16
✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines shorter than
80 characters
R/get_clusters.R:3:1
R/get_clusters.R:14:1
R/get_clusters.R:15:1
R/get_clusters.R:18:1
R/get_clusters.R:24:1
... and 31 more lines
✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.
R/graph3D.R:146:17
✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions. They are error prone and result 1:0 if the expression on the right
hand side is zero. Use seq_len() or seq_along() instead.
tests/testthat/test_get_clusters.R:115:30
tests/testthat/test_get_clusters.R:136:30
✖ fix this R CMD check NOTE: Namespace in Imports field not imported from: ‘colorspace’ All declared Imports should be used.
Package Review
Documentation
The package includes all the following forms of documentation:
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience in README
- [x] Installation instructions: for the development version of package and any non-standard dependencies in README
- [x] Vignette(s) demonstrating major functionality that runs successfully locally
- [x] Function Documentation: for all exported functions in R help
- [x] Examples for all exported functions in R Help that run successfully locally
- [ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).
Review Comments
README
- Has no statement of need for the package
- Only one of the three vignettes listed in the README works
- Maintainer email is only partial
Documentation
-
get_pairwise()
does not describe what it returns. Thus, it's not clear how the serial interval is derived in the first example unless you look at the code. -
[.epicontacts
:x[k = FALSE, j = FALSE]
removes everything but the ids in the linelist... I don't think this is expected -
thin()
documentation doesn't explicitly say what it returns. Additionally, there are no examples. -
summary.epicontacts()
has documentation, but no examples and it doesn't describe what is summarized -
as.igraph()
doesn't work if igraph is not loaded. -
get_clusters()
this needs to describe how the clusters are obtained.
Tests
I see the ebola_sim
linelist used quite a lot. It would be good to define a
small test data set with some known pathologies (contacts without corresponding
linelist components, etc) that we can use for testing known quantities. We also
should get rid of the helper that auto-loads outbreaks. A lot of the tests
additionally seem to repeat a lot of the data construction steps (I believe
because they are set up as part of skip_on_cran()
. There is no need to forbid
all tests on CRAN. Some of them are okay to have.
-
as.igraph.epicontacts()
: OK -
colors
: OK -
clusters
: the first test of as.igraph.epicontacts is unnecessary. I am unsure of what the second test is doing (what is a net node?). Neither of theget_clusters()
tests actually test that the clusters are cromulent. -
get_degree()
: first test should read "both equals sum of in and out". -
get_id()
: I think these first tests need some explanation as to what is going on with the validation. -
get_pairwise()
: The first test needs to be updated to testthat standards. The "different types of attributes" tests need clarification on what they are testing for -
graph3D()
: This takes a long time to run. Use a smaller data set -
handling
: This needs more stress tests and more descriptive tests. A custom test handler might be useful here. -
make_epicontacts
: The first test needs to have more description to confirm that the process actually works -
plot.epicontacts()
: First test has the expectations commented out. Testing plots are always a bit of a PITA. This could benefit from a smaller dataset. Additionally, you can test against the internal workings of the resulting plot. The error expectation needs a specific error message to test against. -
print.epicontacts()
: This is effectively useless. It's not tested on travis or CRAN. -
subset()
: last bit needs to move away from the stored objects. The last test suite should be split up into smaller chunks because it goes all over the place. -
subset_clusters_by_x
: These need to be more specific and targeted. The clustering bit is the only thing that may be uncertain, so we need to make sure we have a data set that can actually test these. -
summary()
: Needs to get rid of the stored object comparison and hard-code the expected values. -
thin()
: OK -
vis_epicontacts()
: seeplot.epicontacts()
I've implemented some changes on a code_review branch (see ea484306d8d582f2a11b6c65f9c256c10bca9e47)
Code
as.igraph.epicontacts
- dplyr dependency?
- merge(x$linelist, all_ids, by = 'id', all = TRUE)
colors
- OK
get_clusters
- dplyr dependency?
get_degree
- No return value
- Doesn’t work correctly for only_linelist = FALSE
- change ‘contacts’ to ‘all’ on L48
-
vapply
counting approach is very slow – perhaps switch totable
get_id
- OK
get_pairwise
- No return value
- Description does not specify default behaviours for different classes
- Vector is named in some cases (attribute is numeric) and not in others (attribute is character)
- Fixed in commit
- Could perhaps add a
rev
argument that calculates to → from - @param attribute should specify that you can add a character or the index of the column
graph3D
- dplyr dependency
- Changed to merge in commit
thin
- Typo in warning message (contacts → contacts)
handling
- No return value specified
- Throws error when subsetting by date
internals
- OK
make_epicontacts
- IDs of class Date should not be allowed
- Should NA IDs be allowed?
plot
- The default qualitative color palette doesn’t really make sense for continuous node attributes, not a massive issue though
print.epicontacts
- Change dplyr::tbl_df → tibble::as_tibble for printing?
print.summary_epicontacts
- OK
subset_clusters_by_id
- Description doesn’t node that linelist is also subsetted
subset_clusters_by_size
- OK
subset.epicontacts
- No return value
- Maybe should be able to set a range for numeric attributes
summary.epicontacts
- OK
thin
- No return value
- The example doesn’t seem to be using thin?
vis_epicontacts
- Looks good for now but will have to looked over again when merging ttree
Tests
test_as.igraph.epicontacts
- OK
test_colors
- OK
test_get_clusters
- “construction of net nodes” doesn’t really test for much AFAICT
- I’ve added a test that manually constructs clusters and tests for them
test_get_degree
- Added a name test for only_linelist = FALSE
- Added a manual test for degree numbers calculation (this method is actually much faster than the current implementation – maybe should be swapped)
test_get_ID
- OK
test_get_pairwise
- No test on values
- Added in commit
test_handling
- Test for erroneous subsetting by Date
test_make_epicontacts
- OK
test_plot.R
- Testing against references not very useful
test_subset_clusters_by_ids
- Added manual test with known clusters
test_subset_clusters_by_size
- OK
test_subset.epicontacts
- OK
test_summary.epicontacts
- OK
test_thin
- OK
test_vis_epicontacts
- Testing against references not very useful
- More tests trying to break the underlying machinery could be useful edge_color, edge_label, edge_width, node_color against numeric, character, factor, date attributes with NAs