software-review
software-review copied to clipboard
canaper: Categorical Analysis of Neo- And Paleo-Endemism in R
Submitting Author: @joelnitta Other Package Authors: (delete if none) Shawn Laffan (@shawnlaffan) Repository: https://github.com/joelnitta/canaper Version submitted: 0.0.1 Submission type: Stats Badge grade: silver Editor: @tdhock Reviewers: @KlausVigo, @luismurao
Due date for @KlausVigo: 2021-11-24Due date for @luismurao: 2021-12-21 Archive: TBD Version accepted: TBD
- Paste the full DESCRIPTION file inside a code block below:
Package: canaper
Title: Categorical Analysis of Neo- And Paleo-Endemism
Version: 0.0.1
Authors@R:
c(
person(given = "Joel H.",
family = "Nitta",
role = c("aut", "cre"),
email = "[email protected]",
comment = c(ORCID = "0000-0003-4719-7472")),
person(given = "Shawn W.",
family = "Laffan",
role = c("ctb", "dtc")),
person(given = "Brent D.",
family = "Mishler",
role = c("ctb", "dtc")),
person(given = "Wataru",
family = "Iwasaki",
role = c("ctb"),
comment = c(ORCID = "0000-0002-9169-9245"))
)
Description: Provides functions to conduct categorical analysis of neo- and paleo-endemism (CANAPE).
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE, roclets = c ("collate", "namespace", "rd", "roxyglobals::global_roclet", "srr::srr_stats_roclet"))
RoxygenNote: 7.1.2
URL: https://github.com/joelnitta/canaper
BugReports: https://github.com/joelnitta/canaper/issues
Imports:
ape,
assertr,
assertthat,
dplyr,
future.apply,
phyloregion,
progressr,
purrr,
stats,
tibble,
vegan
Suggests:
rmarkdown,
knitr,
future,
tictoc,
patchwork,
tidyverse,
testthat (>= 3.0.0),
roxyglobals (>= 0.2.1),
stringr,
magrittr,
covr,
picante
Config/testthat/edition: 3
Depends:
R (>= 3.5.0)
VignetteBuilder: knitr
Remotes:
anthonynorth/roxyglobals
Pre-submission Inquiry
- [x] A pre-submission inquiry has been approved in issue #469
General Information
-
Who is the target audience and what are scientific applications of this package?
- Target audience is evolutionary biologists. Scientific application is the analysis of phylogenetic endemism.
-
Paste your responses to our General Standard G1.1 here, describing whether your software is:
-
The first implementation within R of an algorithm which has previously been implemented in other languages or contexts
-
Please include hyperlinked references to all other relevant software: https://github.com/shawnlaffan/biodiverse
-
-
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
- (Not applicable)
Note also there is a package website available: https://joelnitta.github.io/canaper/
Badging
-
What grade of badge are you aiming for? (bronze, silver, gold)
- Silver, but open to suggestions from the reviewers and editors.
-
If aiming for silver or gold, describe which of the four aspects listed in the Guide for Authors chapter the package fulfils (at least one aspect for silver; three for gold)
- Demonstrates excellence in compliance with multiple standards from at least two broad sub-categories
- Compliance with a good number of standards beyond those identified as minimally necessary. I am in compliance with most of the standards for "General" and "Dimensionality Reduction, Clustering, and Unsupervised Learning".
Technical checks
Confirm each of the following by checking the box.
-
[x] I/we have read the guide for authors and rOpenSci packaging guide.
-
[x] I/we have read the Statistical Software Peer Review Guide for Authors.
-
[ ] I/we have run
autotest
checks on the package, and ensured no tests fail.-
autotest
hangs (doesn't produce any output after waiting ca. 20 minutes) on most functions. I was only able to use it oncpr_classify_endem()
.
-
-
[x] The
srr_stats_pre_submit()
function confirms this package may be submitted.
This package:
- [x] does not violate the Terms of Service of any service it interacts with.
- [x] has a CRAN and OSI accepted license.
- [x] contains a README with instructions for installing the development version.
Publication options
-
[x] Do you intend for this package to go on CRAN?
-
[ ] Do you intend for this package to go on Bioconductor?
- Also, I intend to submit a paper about this package to Methods in Ecology and Evolution after ROpenSci review.
Code of conduct
- [x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
:rocket:
The following problem was found in your submission template:
- 'author1' variable must be GitHub hanle only ('@myhandle') Editors: Please ensure these problems with the submission template are rectified. Package checks have been started regardless.
:wave:
Checks for canaper (v0.0.1)
git hash: 43a9f0a2
- :heavy_check_mark: Package name is available
- :heavy_check_mark: has a 'CITATION' file.
- :heavy_check_mark: has a 'codemeta.json' file.
- :heavy_check_mark: has a 'contributing' file.
- :heavy_check_mark: uses 'roxygen2'.
- :heavy_check_mark: 'DESCRIPTION' has a URL field.
- :heavy_check_mark: 'DESCRIPTION' has a BugReports field.
- :heavy_check_mark: Package has at least one HTML vignette
- :heavy_check_mark: All functions have examples.
- :heavy_check_mark: Package has continuous integration checks.
- :heavy_check_mark: Package coverage is 100%.
- :heavy_check_mark: R CMD check found no errors.
- :heavy_check_mark: R CMD check found no warnings.
Package License: MIT + file LICENSE
1. rOpenSci Statistical Standards (srr
package)
This package is in the following category:
- Dimensionality Reduction, Clustering and Unsupervised Learning
:heavy_check_mark: All applicable standards [v0.1.0] have been documented in this package
Click here to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report()
function from within a local clone of the repository.
2. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in R (100% in 11 files) and
- 1 authors
- 3 vignettes
- 8 internal data files
- 11 imported packages
- 4 exported functions (median 41 lines of code)
- 30 non-exported functions in R (median 7 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:
-
loc
= "Lines of Code" -
fn
= "function" -
exp
/not_exp
= exported / not exported
The final measure (fn_call_network_size
) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
measure | value | percentile | noteworthy |
---|---|---|---|
files_R | 11 | 59.3 | |
files_vignettes | 2 | 83.0 | |
files_tests | 8 | 85.7 | |
loc_R | 650 | 52.9 | |
loc_vignettes | 162 | 64.9 | |
loc_tests | 939 | 83.5 | |
num_vignettes | 3 | 93.1 | |
data_size_total | 66301 | 81.2 | |
data_size_median | 509 | 59.9 | |
n_fns_r | 34 | 35.6 | |
n_fns_r_exported | 4 | 15.6 | |
n_fns_r_not_exported | 30 | 42.6 | |
n_fns_per_file_r | 3 | 42.7 | |
num_params_per_fn | 4 | 67.6 | |
loc_per_fn_r | 10 | 40.0 | |
loc_per_fn_r_exp | 41 | 75.3 | |
loc_per_fn_r_not_exp | 7 | 29.9 | |
rel_whitespace_R | 13 | 43.9 | |
rel_whitespace_vignettes | 0 | 0.0 | TRUE |
rel_whitespace_tests | 7 | 84.7 | |
doclines_per_fn_exp | 55 | 68.2 | |
doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
fn_call_network_size | 9 | 22.0 |
2a. Network visualisation
Interactive network visualisation of calls between objects in package can be viewed by clicking here
3. goodpractice
and other checks
Details of goodpractice and other checks (click to open)
3a. Continuous Integration Badges
GitHub Workflow Results
name | conclusion | sha | date |
---|---|---|---|
pkgdown | success | 43a9f0 | 2021-10-26 |
R-CMD-check | success | 43a9f0 | 2021-10-26 |
Render codemeta | failure | 43a9f0 | 2021-10-26 |
test-coverage | success | 43a9f0 | 2021-10-26 |
3b. goodpractice
results
R CMD check
with rcmdcheck
rcmdcheck found no errors, warnings, or notes
Test coverage with covr
Package coverage: 100
Cyclocomplexity with cyclocomp
The following functions have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
cpr_rand_test | 29 |
calc_biodiv_random | 16 |
Static code analyses with lintr
lintr found the following 353 potential issues:
message | number of times |
---|---|
Lines should not be more than 80 characters. | 353 |
Package Versions
package | version |
---|---|
pkgstats | 0.0.2.16 |
pkgcheck | 0.0.2.86 |
srr | 0.0.1.120 |
Editor-in-Chief Instructions:
This package is in top shape and may be passed on to a handling editor
@ropensci-review-bot seeking reviewers
I'm sorry @tdhock, I'm afraid I can't do that. That's something only editors are allowed to do.
@ropensci-review-bot assign @tdhock as editor
Assigned! @tdhock is now the editor
I forgot to mention, I plan to submit this as a paper to Methods in Ecology and Evolution after ROpenSci review (have also added this to the original description)
@ropensci-review-bot assign @KlausVigo as reviewer
@tdhock Only just realised the command there has the wrong syntax - should be add <@name> to reviewers
, like this example. Could you please try again? Nevertheless good that you got this one wrong, because it made us realise we need the bot to tell you it didn't understand. Thanks!
@ropensci-review-bot add @KlausVigo to reviewers by the way it would be more user friendly to have the add editor and add reviewer commands be consistent / analogous, I guess that is what you are doing in https://github.com/ropensci-org/buffy/pull/42 ?
sorry @tdhock, another bot glitch there - bot commands have to be kept as just that, with no additional comments. And yes, the point of the above-linked issue is to aid consistency between analogous commands, so that will hopefully improve soon. In the meantime, could you please try again, again, with just the command? With due apologies and gratitude! :bowing_man:
@ropensci-review-bot add @KlausVigo to reviewers
Can't assign reviewer because there is no editor assigned for this submission yet
@tdhock sorry for all the hiccups! The initial issue body had been edited after your username was added so your username was no longer in the string <!--editor--> @tdhock<!--end-editor-->
I edited it again so you can try again. Thanks for your patience!
@ropensci-review-bot add @KlausVigo to reviewers
@KlausVigo added to the reviewers list. Review due date is 2021-11-24. Thanks @KlausVigo for accepting to review! Please refer to our reviewer guide.
@KlausVigo: If you haven't done so, please fill this form for us to update our reviewers records.
@ropensci-review-bot add @luismurao to reviewers
@luismurao added to the reviewers list. Review due date is 2021-12-21. Thanks @luismurao for accepting to review! Please refer to our reviewer guide.
@luismurao: If you haven't done so, please fill this form for us to update our reviewers records.
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- Briefly describe any working relationship you have (had) with the package authors.
- [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- [x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
- [x] Installation instructions: for the development version of package and any non-standard dependencies in README
- [x] Vignette(s): demonstrating major functionality that runs successfully locally
- [x] Function Documentation: for all exported functions
- [x] Examples: (that run successfully locally) for all exported functions
- [ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).
Functionality
- [ ] Installation: Installation succeeds as documented.
- [x] Functionality: Any functional claims of the software been confirmed.
- [ ] Performance: Any performance claims of the software been confirmed.
- [x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- [x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
Estimated hours spent reviewing:
-
~ 9
-
[ ] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Review Comments
-
The
canaper
package provides functions to infer the evolutionary processes underlying neoendemism and paleoendemism. The software is an R implementation of the Categorical Analysis of Neo- and Paleo-endemism, which is only available in the Biodiverse software. -
I enjoyed reviewing the package, but I found some things that might improve the package.
- One concern is about parallelization: in the current stage of the package, parallelization calls are done outside the functions (via
future::plan(future::multisession)
). I think it would be better to have parameters for parallelization in functions that allow running analysis in parallel (so calls are done inside them).
- One concern is about parallelization: in the current stage of the package, parallelization calls are done outside the functions (via
Parallelization calls as in the current version of the package
library(canaper) # This package :)
library(future) # For parallel computing
library(tictoc) # For timing things
library(patchwork) # For composing mult-part plots
library(tidyverse) # For data-wrangling and plottin
data(acacia)
plan(multisession,workers=ncores)
set.seed(071421)
acacia_rand_res <- cpr_rand_test(
acacia$comm, acacia$phy,
null_model = "curveball",
n_reps = 100, n_iterations = 100000,
tbl_out = TRUE)
plan(sequential)
I think this could be more appropriate and easy to use
acacia_rand_res <- cpr_rand_test(
acacia$comm, acacia$phy,
null_model = "curveball",
n_reps = 100, n_iterations = 100000,
tbl_out = TRUE
parallel = TRUE, # Logical var to run processes in parallel
nworkes = 3, # Number of threads
random_seed=071421) # Random seed
- Instead of suggesting the
future
package I would import it
README
- No code of conduct and contribution guidelines, please add it
Installation
There is a potential conflict in package dependencies as canaper
depends on R (>= 3.5.0), but it imports phangorn
version 2.8.1, which depends on R (>= 4.1.0). Please change parameter Depends: R (>= 3.5.0) to Depends: R (>= 4.1.0) in the description file or provide a way to install an earlier version of phangorn
(version 2.5.5 should work).
- I installed the package locally on macOS (Big Sur), Linux (Ubuntu 20.04.3 & 21.04), and Windows 10
Automated test
-
devtools::check()
takes more than six minutes. Building of vignette outputs takes a lot of time (5m 5.2s). Consider using lees number of replicates in examples (i.e. n_reps ofcanaper::cpr_rand_test
).- I got one error after upgrading from GEOS 3.9 to 3.10 in Linux. The issue is related to phyloregion dependency.
══ Warnings ════════════════════════════════════════════════════════════════════
── Warning (test-utils.R:143:3): Functions copied from phyloregion work ────────
rgeos: versions of GEOS runtime 3.10.1-CAPI-1.16.0
and GEOS at installation 3.9.1-CAPI-1.14.2differ
Backtrace:
1. testthat::expect_equal(dense2sparse(biod_example$comm), phyloregion::dense2sparse(biod_example$comm)) test-utils.R:143:2
4. base::loadNamespace(x)
7. base::loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]])
8. base:::runHook(".onLoad", env, package.lib, package)
13. rgeos:::fun(libname, pkgname)
-
devtools::test()
finished with no fails and one warning
────────────────────────────────────────────────────────────────────
Warning (test-utils.R:143:3): Functions copied from phyloregion work
rgeos: versions of GEOS runtime 3.10.1-CAPI-1.16.0
and GEOS at installation 3.9.1-CAPI-1.14.2differ
Backtrace:
1. testthat::expect_equal(dense2sparse(biod_example$comm), phyloregion::dense2sparse(biod_example$comm)) test-utils.R:143:2
4. base::loadNamespace(x)
7. base::loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]])
8. base:::runHook(".onLoad", env, package.lib, package)
13. rgeos:::fun(libname, pkgname)
────────────────────────────────────────────────────────────────────
══ Results ═════════════════════════════════════════════════════════
Duration: 71.8 s
[ FAIL 0 | WARN 1 | SKIP 0 | PASS 255 ]
-
devtools::spell_check()
Change "signficance" to "significance" in line 132
signficance canape.Rmd:132
Performance
When benchmarking, I found a strange behavior. After using more than four threads, the program takes more time to finish a randomization test. Here is the code I used (could you please check it? I might be missing something)
library(purrr)
library(tidyverse)
library(tictoc)
library(future)
library(ggplot2)
rm(list = ls())
data(acacia)
acom <- acacia$comm
acom[acom>0] <- 1
n_reps_vec <- c(100,500,1000,1500,2000)
nwokers <- 1:7
acom <- acacia$comm
acom[acom>0] <- 1
rand_times <- seq_along(n_reps_vec) %>% purrr::map_dfr(function(nrep){
rand_t <- nwokers %>% purrr::map_dfr(function(nwork){
if(nwork>1){
plan(multisession,workers=nwork)
}
set.seed(071421)
tictoc::tic()
# Run randomization test
acacia_rand_res <- canaper::cpr_rand_test(
acom, acacia$phy,
null_model = "curveball",
n_reps = n_reps_vec[nrep],
n_iterations = 100000,
tbl_out = TRUE)
aa <- tictoc::toc()
r1 <-data.frame(n_reps = n_reps_vec[nrep] ,
nworkers=nwork,
time_secs = aa$toc - aa$tic)
plan(sequential)
return(r1)
})
return(rand_t)
})
ggplot(rand_times, aes(x = ncores,y=time_secs,color=as.factor(n_reps))) +
geom_line() + theme_classic()
Questions and comments
- I was wondering if
canaper
has already all methods of the CANAPE version of Biodiverse. Could write something about it in the README? - Although the vignettes are very useful and well documented, it would be nice to have methods for plotting spatial data results.
@joelnitta please do not hesitate to contact me if you have any questions. Thanks for developing canaper
.
@luismurao thanks so much for your review! I will try to address your comments soon.
@KlausVigo it would be great to have your comments as well so I can make changes incorporating both sets of reviews.
@KlausVigo, do you still intend to review this package?
If not, @tdhock can you please assign a new reviewer, as it is now multiple months past the due date.
Thanks!
Hi @joelnitta! I will finish it today.
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- Briefly describe any working relationship you have (had) with the package authors.
- [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- [x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
- [x] Installation instructions: for the development version of package and any non-standard dependencies in README
- [x] Vignette(s): demonstrating major functionality that runs successfully locally
- [x] Function Documentation: for all exported functions
- [x] Examples: (that run successfully locally) for all exported functions
- [ ] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with
URL
,BugReports
andMaintainer
(which may be autogenerated viaAuthors@R
).
Functionality
- [x] Installation: Installation succeeds as documented.
- [x] Functionality: Any functional claims of the software been confirmed.
- [ ] Performance: Any performance claims of the software been confirmed.
- [x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- [ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
Estimated hours spent reviewing: 12
- [x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Review Comments
-
The canaper package provides functions to conduct categorical analysis of neo- and paleo-endemism (CANAPE). It is an reimplementation of the Biodiverse software, but there are some functions in the
picante
(ses.pd
) and in thePhyloMeasures
package for Faith's phylogenetic diversity with a similar scope. -
The canaper package builds on functions from the vegan package to resample (dense) community matrices and borrows biodiversity measures (Faith's phylogenetic diversity and phylogenetic endemism) from the phyloregion package. The
picante
package also has some functions for swapping (community) matrices similar to vegan. A comparison between the packages would be useful.
Specific comments
-
The functions from the
phyloregion
package used in the package should be acknowledged properly. There is a reference in the README and in the source that the biodiversity measure are from the phyloregion package. However tools like depsy which trace back contributions will likely check only the DESCRIPTION file for imported packages, contributions etc.. If functions are copied the CRAN Repository Policy suggests to acknowledge contributors (ctb
) in the Author field. I opened an issue to improve the development guide (https://github.com/ropensci/dev_guide/issues/388) with some more additional information. Copying functions can additionally cause copyright problems. An alternative is importing the functions from the package directly. -
In contrast to the other reviewer I really like the use of the
future
package. Parallelization is dependent on the hardware, the operation system and on other things like if the functions run in a terminal or GUI. To avoid crashes this needs lots of internal checks and guessing and often will not lead to the best result. I prefer to give users the power to adopt the parallelization to their specific needs.
Profiling
In my opinion the main bottleneck for this kind of analysis is memory consumption and only minor part computing time. Dense community matrices are now frequently getting too large to be hold in memory (RAM). Luckily community matrices are usually extremely sparse and this allows efficient coding. For example the acacia data set, which comes with canaper, has less than 2% non-zero cells and a sparse representation takes only 1/20 of memory of the dense representation.
> library(canaper)
> library(Matrix)
> data("acacia")
> dim(acacia$comm)
[1] 3037 506
> comm <- as.matrix(acacia$comm)
> sum(comm > 0) / prod(dim(comm))
[1] 0.01898522
> object.size(comm)
12597112 bytes
> object.size(Matrix(comm, sparse=TRUE))
656576 bytes
The functions to compute the biodiversity measures make use of the sparsity, but the functions to sample the matrices not. We developed the phyloregion using sparse matrices which resulted in a far lower memory consumption and as a benefit it turned out faster in many cases (https://phyloregion.com/articles/Benchmark.html). With larger data sets we get into a trade-off. We want to use more cores, but this means several copies of the community matrices in memory. Providing a swapping algorithm which can handle sparse matrices would be a huge improvement for the community.
I used the following code for profiling:
library(canaper)
data(acacia)
Rprof(tmp <- tempfile(), memory.profiling = TRUE)
acacia_rand_res <- cpr_rand_test(
acacia$comm, acacia$phy,
null_model = "curveball",
n_reps = 100,
n_iterations = 100000)
Rprof()
summaryRprof(tmp, memory="both")
unlink(tmp)
I just highlight a few lines of the output I find interesting:
...
"calc_biodiv_random" 111.54 92.35 29477.9 0.02 0.02
"cpr_rand_comm" 99.56 82.43 22420.0 1.30 1.08
"simulate.nullmodel" 83.18 68.87 11655.6 1.24 1.03
...
"phylo_community" 5.72 4.74 2994.7 1.60 1.32
"phylo_endemism" 5.14 4.26 2876.1 0.02 0.02
"assertthat::assert_that" 4.66 3.86 1867.0 0.10 0.08
...
"PD" 2.56 2.12 1656.1 0.00 0.00
...
-
Not unexpected the largest amount of time is spent in re-sampling the community matrices (
simulate.nullmodel
). -
The canaper package uses defensive programming extensively (calling
assertthat::assert_that
), which is generally a very good thing. However, here are some exceptions: The functionscalc_biodiv_random
andcpr_rand_comm
are only called internally and are called thousands of times inside of the loop. It should be save to remove all (most) of the calls toassertthat::assert_that
from these functions as parameters have been checked already and are save to use! In fact the algorithm spent a similar amount of time inassertthat::assert_that
than computing the biodiversity measures (phylo_endemism
,PD
). -
When computing several biodiversity metrics at once a lot of intermediate results are shared and could be reused. This is not as important and the current approach allows to easily extend the
cpr_rand_test
function with additional biodiversity measures.
For submitting to CRAN:
There are some instances where more than 2 cores are used, e.g. the vignette canape.Rmd, line 83:
plan(multisession, workers = 3)
This could fail on CRAN, when submitting. There the use of maximal 2 cores is allowed and I speak from experience, I got Ripleyed for this once! Again the CRAN Repository Policy is a useful resource and gives some background information: "Checking the package should take as little CPU time as possible, as the CRAN check farm is a very limited resource and there are thousands of packages. Long-running tests and vignette code can be made optional for checking, but do ensure that the checks that are left do exercise all the features of the package. If running a package uses multiple threads/cores it must never use more than two simultaneously: the check farm is a shared resource and will typically be running many checks simultaneously. Examples should run for no more than a few seconds each: they are intended to exemplify to the would-be user how to use the functions in the package."
@KlausVigo thank you very much for the extensive and helpful review!
I just wanted to address one thing first: the reason I copied functions from phyloregion instead of importing them is because of a dependency of phyloregion (betapart), which has a package in Suggests
(snow) that was resetting the random number generator and thus cpr_rand_test()
was failing to return identical results when using parallel computing, even when set.seed()
was used. There is an issue describing the problem in canaper, and a related discussion in the future package.
In the end, I opted to copy the phyloregion functions under the AGPL-3 license, since that fixed the issue and resulted in fewer dependencies. As @HenrikBengtsson mentioned though, it seems another way to fix this (and import phyloregion normally) would be to use future.packages = "snow"
with future.apply::future_lapply()
. Would you prefer that?
(By the way, it is possible that this behavior of snow may cause reproducibility issues with phyloregion as well. I suggest you look into it if you haven't already).
hi @KlausVigo thanks for your detailed review. can you please tell us approximately how many hours you spent on reviewing? (you left that field blank in your review comment)
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/475#issuecomment-1002754119 time 9