software-review Submission - melt: Multiple Empirical Likelihood Tests

Submitting Author Name: Eunseop Kim Submitting Author Github Handle: @markean Repository: https://github.com/markean/melt Version submitted: 1.6.0.9000 Submission type: Stats Badge grade: silver Editor: @Paula-Moraga Reviewers: @pchausse, @awstringer1

Due date for @pchausse: 2022-09-05

Due date for @awstringer1: 2022-08-29 Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Type: Package
Package: melt
Title: Multiple Empirical Likelihood Tests
Version: 1.6.0.9000
Authors@R: c(
    person("Eunseop", "Kim", , "[email protected]", role = c("aut", "cre")),
    person("Steven", "MacEachern", role = c("ctb", "ths")),
    person("Mario", "Peruggia", role = c("ctb", "ths"))
  )
Description: Performs multiple empirical likelihood tests for linear and
    generalized linear models. The core computational routines are
    implemented using the 'Eigen' C++ library and 'RcppEigen' interface,
    with OpenMP for parallel computation. Details of multiple testing
    procedures are given in Kim, MacEachern, and Peruggia (2021)
    <arxiv:2112.09206>.
License: GPL (>= 2)
URL: https://github.com/markean/melt, https://markean.github.io/melt/
BugReports: https://github.com/markean/melt/issues
Depends: 
    R (>= 4.0.0)
Imports: 
    graphics,
    methods,
    Rcpp,
    stats
Suggests: 
    covr,
    ggplot2,
    knitr,
    microbenchmark,
    rmarkdown,
    spelling,
    testthat (>= 3.0.0),
    withr
LinkingTo: 
    BH,
    dqrng,
    Rcpp,
    RcppEigen
VignetteBuilder: 
    knitr
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-US
LazyData: true
NeedsCompilation: yes
Roxygen: list (markdown = TRUE, roclets = c ("namespace", "rd",
    "srr::srr_stats_roclet"))
RoxygenNote: 7.2.0

Pre-submission Inquiry

[x] A pre-submission inquiry has been approved in issue #549

General Information

Who is the target audience and what are scientific applications of this package?
Paste your responses to our General Standard G1.1 here, describing whether your software is: The package attempts the first implementation of the nested bilevel optimization approach within R to compute constrained empirical likelihood. The inner layer Newton-Raphson method for empirical likelihood is written in C++, enabling faster computation than other routines written in R.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? Yes.

Badging

What grade of badge are you aiming for? (bronze, silver, gold) Silver.
If aiming for silver or gold, describe which of the four aspects listed in the Guide for Authors chapter the package fulfils (at least one aspect for silver; three for gold)

Compliance with a good number of standards beyond those identified as minimally necessary.
Have a demonstrated generality of usage beyond one single envisioned use case. In my opinion, the generality comes from the applicability of empirical likelihood methods to (generalized) linear models and the ability to test linear hypotheses of choice.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the rOpenSci packaging guide.
[x] I have read the author guide and I expect to maintain this package for at least 2 years or have another maintainer identified.
[x] I/we have read the Statistical Software Peer Review Guide for Authors.
[x] I/we have run autotest checks on the package, and ensured no tests fail.
[x] The srr_stats_pre_submit() function confirms this package may be submitted.
[x] The pkgcheck() function confirms this package may be submitted - alternatively, please explain reasons for any checks which your package is unable to pass.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.

Publication options

[x] Do you intend for this package to go on CRAN? The package is on CRAN.
[ ] Do you intend for this package to go on Bioconductor?

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Jul 15 '22 03:07 markean

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

Jul 15 '22 03:07 ropensci-review-bot

:rocket:

Editor check started

:wave:

Jul 15 '22 03:07 ropensci-review-bot

Checks for melt (v1.6.0.9000)

git hash: 1b9bbebd

:heavy_check_mark: Package is already on CRAN.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 99.8%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Package License: GPL (>= 2)

1. rOpenSci Statistical Standards (`srr` package)

This package is in the following category:

Regression and Supervised Learning

:heavy_check_mark: All applicable standards [v0.1.0] have been documented in this package (86 complied with; 30 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.

2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type	package	ncalls
internal	base	146
internal	melt	66
imports	methods	83
imports	stats	66
imports	graphics	8
imports	Rcpp	NA
suggests	covr	NA
suggests	ggplot2	NA
suggests	knitr	NA
suggests	microbenchmark	NA
suggests	rmarkdown	NA
suggests	spelling	NA
suggests	testthat	NA
suggests	withr	NA
linking_to	BH	NA
linking_to	dqrng	NA
linking_to	Rcpp	NA
linking_to	RcppEigen	NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (15), q (14), attr (13), warning (13), numeric (12), c (7), nrow (7), integer (6), length (5), print (5), as.integer (4), match.call (4), names (4), ncol (4), tryCatch (4), do.call (3), if (3), mean (3), as.vector (2), cbind (2), logical (2), match (2), rownames (2), as.matrix (1), colMeans (1), is.null (1), matrix (1), NROW (1), rbind (1), sqrt (1), sum (1), t (1), table (1)

methods

setGeneric (36), setMethod (36), setClass (11)

melt

error (13), el_control (8), validate_optim (4), calibrate (3), compute_EL (3), validate_weights (3), compute_generic_EL (2), get_max_threads (2), test_GLM (2), test_LM (2), validate_family (2), compute_bootstrap_calibration (1), compute_confidence_intervals (1), compute_confidence_region (1), compute_ELD (1), el_eval (1), el_glm (1), el_lm (1), el_mean (1), el_sd (1), get_rank (1), test_hypothesis (1), test_multiple_hypotheses (1), validate_alpha (1), validate_b (1), validate_calibrate (1), validate_cv (1), validate_hypotheses (1), validate_hypothesis (1), validate_keep_data (1), validate_level (1), validate_lhs (1), validate_x (1)

stats

formula (14), df (8), offset (6), optim (6), na.action (5), weights (5), family (4), pchisq (3), step (3), contrasts (2), model.response (2), model.weights (2), sd (2), glm.fit (1), pf (1), qchisq (1), qf (1)

graphics

par (8)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.

3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

code in C++ (53% in 15 files) and R (47% in 29 files)
1 authors
2 vignettes
1 internal data file
4 imported packages
59 exported functions (median 4 lines of code)
120 non-exported functions in R (median 7 lines of code)
91 R functions (median 14 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:

loc = "Lines of Code"
fn = "function"
exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure	value	percentile	noteworthy
files_R	29	88.8
files_src	15	95.4
files_vignettes	2	85.7
files_tests	17	95.3
loc_R	1629	80.0
loc_src	1827	74.0
loc_vignettes	129	33.7
loc_tests	1243	89.7
num_vignettes	2	89.2
data_size_total	1311	61.2
data_size_median	1311	65.8
n_fns_r	179	88.1
n_fns_r_exported	59	90.2
n_fns_r_not_exported	120	87.0
n_fns_src	91	78.1
n_fns_per_file_r	3	55.1
n_fns_per_file_src	6	55.6
num_params_per_fn	2	11.9
loc_per_fn_r	7	16.0
loc_per_fn_r_exp	4	4.3	TRUE
loc_per_fn_r_not_exp	7	18.0
loc_per_fn_src	14	46.6
rel_whitespace_R	8	59.8
rel_whitespace_src	5	49.5
rel_whitespace_vignettes	8	6.5
rel_whitespace_tests	8	72.2
doclines_per_fn_exp	34	40.2
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	253	90.8

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

4. `goodpractice` and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

GitHub Workflow Results

id	name	conclusion	sha	run_number	date
2673111985	pages build and deployment	success	6ac77c	46	2022-07-14
2673079345	pkgcheck	failure	1b9bbe	51	2022-07-14
2673079344	pkgdown	success	1b9bbe	49	2022-07-14
2673079342	R-CMD-check	success	1b9bbe	71	2022-07-14
2673079354	test-coverage	success	1b9bbe	153	2022-07-14

3b. `goodpractice` results

`R CMD check` with rcmdcheck

R CMD check generated the following note:

checking installed package size ... NOTE installed size is 77.8Mb sub-directories of 1Mb or more: libs 77.2Mb

R CMD check generated the following check_fail:

rcmdcheck_reasonable_installed_size

Test coverage with covr

Package coverage: 99.78

Cyclocomplexity with cyclocomp

The following function have cyclocomplexity >= 15:

function	cyclocomplexity
el_glm	18

Static code analyses with lintr

lintr found the following 5 potential issues:

message	number of times
Avoid library() and require() calls in packages	4
Lines should not be more than 80 characters.	1

Package Versions

package	version
pkgstats	0.1.1.3
pkgcheck	0.0.3.76
srr	0.0.1.167

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

Jul 15 '22 09:07 ropensci-review-bot

Hi @markean - I just wanted to provide a brief update here. I apologize for the delay in next steps here; it's not for lack of excitement to kick off this review! I hope to have a handling editor ready to assign next week.

Jul 22 '22 13:07 emilyriederer

Thanks for the update @emilyriederer!

Jul 22 '22 18:07 markean

@ropensci-review-bot assign @Paula-Moraga as editor

Jul 28 '22 01:07 emilyriederer

Assigned! @Paula-Moraga is now the editor

Jul 28 '22 01:07 ropensci-review-bot

@markean , I'm delighted to share that @Paula-Moraga will serve as editor for this package. She is away this week but is available to commence next week. We will be in more touch soon!

Jul 28 '22 01:07 emilyriederer

@emilyriederer, thanks for the news!

Jul 28 '22 06:07 markean

Hi @markean. I am pleased to handle this package. I will look for reviewers.

Aug 03 '22 11:08 Paula-Moraga

@ropensci-review-bot seeking reviewers

Aug 03 '22 11:08 Paula-Moraga

I'm sorry @Paula-Moraga, I'm afraid I can't do that. That's something only editors are allowed to do.

Aug 03 '22 11:08 ropensci-review-bot

Hi @Paula-Moraga. Thank you for your time and effort. Please let me know if there is anything I can help in the review process. @emilyriederer Could you update the label?

Aug 03 '22 22:08 markean

Done! I'll ask the bot team on Slack if they know why @Paula-Moraga couldn't herself since she is, in fact, the editor!

Aug 03 '22 23:08 emilyriederer

@ropensci-review-bot assign @pchausse as reviewer

Aug 04 '22 13:08 Paula-Moraga

@pchausse added to the reviewers list. Review due date is 2022-08-25. Thanks @pchausse for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

Aug 04 '22 13:08 ropensci-review-bot

@pchausse: If you haven't done so, please fill this form for us to update our reviewers records.

Aug 04 '22 13:08 ropensci-review-bot

@ropensci-review-bot set due date for @pchausse to 2022-09-05

Aug 04 '22 13:08 Paula-Moraga

Review due date for @pchausse is now 05-September-2022

Aug 04 '22 13:08 ropensci-review-bot

Hi @pchausse. Thank you so much for reviewing our package. Please let me know if you have any question. A draft version of pdf vignette in vignettes folder might be of help.

Aug 05 '22 03:08 markean

@ropensci-review-bot assign @awstringer1 as reviewer

Aug 08 '22 13:08 Paula-Moraga

@awstringer1 added to the reviewers list. Review due date is 2022-08-29. Thanks @awstringer1 for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

Aug 08 '22 13:08 ropensci-review-bot

@awstringer1: If you haven't done so, please fill this form for us to update our reviewers records.

Aug 08 '22 13:08 ropensci-review-bot

Hi @awstringer1! Thank you for reviewing the package. Please let me know if you have any question. A draft version of pdf vignette in vignettes folder might be of help.

@pchausse and @awstringer1: By an external request, a newer version of the package will be released to CRAN in August with some minor updates. It will not interfere with the review process, but I will tag it after the release. Then the reviewer comments will be reflected to the package in a subsequent release.

Aug 09 '22 05:08 markean

I am not sure if this is something the authors did, or something the rOpenSci template does, but the code line references in the "srr report" do not seem to match up with the locations of the tags in the roxygen documentation. They are a few lines off. This is fine mostly but sometimes I cannot find the relevant tag, e.g. for "Standards in on line#306 of file R/AllGenerics.R:" Can the authors comment? Thank you.

Aug 10 '22 14:08 awstringer1

@awstringer1 That's because the repo has been updated since the report linked to above was generated. As stated in the automated checks, you can - and in this case should - generate a local, and up-to-date, version of the report by running srr::srr_report(path), where path is where your local clone of melt is. By default that will open the full report in your web browser, in which all reported line numbers should match current state of repo.

Aug 10 '22 15:08 mpadge

Result of running srr_report on a fresh install of the development version of melt on my machine:

> srr::srr_report(file.path(.libPaths(),'melt'))
ℹ Loading melt
Writing 'NAMESPACE'
Writing 'NAMESPACE'
Loading required namespace: rmarkdown
Error in flist[sapply(blocks, inherits, "try-error")] : 
  invalid subscript type 'list'
In addition: Warning messages:
1: Failed to load at least one DLL: must specify DLL via a “DLLInfo” object. See getLoadedDLLs() 
2: In setup_ns_exports(path, export_all, export_imports) :
  Objects listed as exports, but not present in namespace: el_control, el_eval, el_glm, el_lm, el_mean, el_sd

I got a different error when trying on the CRAN version. Session information:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5

Aug 10 '22 15:08 awstringer1

@awstringer1 The path should be your local clone of the package, not the installed path. I can generate the report with srr::srr_report(path) after cloning. I guess that is what @mpadge meant above.

Aug 10 '22 18:08 markean

@pchausse and @awstringer1: v1.7.0 has been released to CRAN. No changes were made with respect to the srr tags. The code on the repository will be updated after reviewers’ comments. Thank you.

Aug 13 '22 06:08 markean

Here is my review, I hope it is helpful!

Package Review

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Compliance with Standards

[x] This package complies with a sufficient number of standards for a silver badge
[x] This grade of badge is the same as what the authors wanted to achieve

The following standards currently deemed non-applicable (through tags of @srrstatsNA) could potentially be applied to future versions of this software: (Please specify)

Please also comment on any standards which you consider either particularly well, or insufficiently, documented.

For packages aiming for silver or gold badges:

[x] This package extends beyond minimal compliance with standards in the following ways: (please describe)

The package seems to have been developed for future general use in mind. The authors have provided an interface that is clearly intended to be used beyond the examples considered in their associated preprint. A large number of the standards described in the rOpenSci review materials seem to have been considered by the authors when designing their software.

Comments

Overall, the authors appear to have convincingly motivated that the package meets the rOpenSci standards. Many statements in the authors' @srrstats documentation tags contain incomplete references to the package documentation, which precludes a more thorough review. See below for a non-exhaustive list of examples. I have the following comments about the authors' justifications:

"@srrstats {RE1.4} Some of the core distributional assumptions are documented.": this is potentially confusing to users. Either all of the regularity conditions should be clearly documented, or a link to a specific section in a paper where these conditions are clearly documented should be given. Although arguably common in the literature, the phrase "under some regularity conditions" is incomplete and not meaningful, and I suggest its removal.
"@srrstats {G2.1, G2.1a} Assertions on types of inputs are clarified throughout the package documentation.": it would be helpful to the reviewers to give specific references to the package documentation. There are a number of similar "throughout the package documentation" statements that render it challenging to assess compliance with standards, beyond a general sense of such.
Possible typo: "na.cation()" in el_glm.R, line 64.
"@srrstats {RE1.0, RE1.1} Formula interface is used to the formula argument, and how it is converted to a matrix input is documented as well.": a reference to the function, and section of documentation where this is located would better facilitate review.
"@srrstats {RE4.11} logLik() method extracts the empirical log-likelihood value which can be interpreted as a measure of goodness-of-fit.": more detail could be provided about how this is to be used as a measure of goodness of fit, perhaps a link to where this is discussed in the package documentation or the preprint?
"@srrstatsNA {G1.5} The package does not have any associated publication.": is it appropriate to cite the preprint here (this might be a question for the Editor)?
"@srrstatsNA {RE2.3} It is beyond the scope of estimating functions that the package considers. There seems to be no simple way to pass this information to the internal routines. Users can use el_eval() to manually construct estimating functions with the offset included, but any further method not applicable to the output of el_eval().": this appears to be in response to standard RE2.3 which states: "RE2.3 Where applicable, Regression Software should enable data to be centred (for example, through converting to zero-mean equivalent values; or to z-scores) or offset (for example, to zero-intercept equivalent values) via additional parameters, with the effects of any such parameters clearly documented and tested.". The present package uses lm and glm, as well as a standard formula interface, all of which facilitates centering and offsets, so it is not clear why these tasks are out of scope for the present package.

General Review

Documentation

The package includes all the following forms of documentation:

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience in README

I do not see such a statement. I think this statement could be helpful: a short paragraph describing why empirical likelihood methods are useful in linear and generalized linear models could be informative to the user, since (a) empirical likelihood is a somewhat "specialized" topic and (b) there are other methods available for doing hypothesis tests in linear and generalized linear models, and such methods also rely on limiting Chi-square distributional arguments. It is at least not trivially clear why empirical likelihood is useful and why we'd want a package for it, so I think being very up front about this could better motivate the package.

[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING

I see no such statement.

[ ] The documentation is sufficient to enable general use of the package beyond one specific use case

The documentation is almost there, but I am going to leave this unchecked for now. I think adding more detailed "Details" in all core functions, even if this involves a lot of copy and pasting, is a good idea, see (https://r-pkgs.org/man.html)[https://r-pkgs.org/man.html]. For example, ?el_lm contains very detailed documentation, but ?el_mean does not. It is helpful to users to have everything on one page, even if the material is repeated across many functions' documentation.

Algorithms

The core algorithms are implemented in C++. I will do my best to provide appropriate comments here, but I am not an expert on C++.

EL.cpp line 103: is there justification for this step-halving mechanism? It is only applied once within the loop, so how can it ensure that the function value is increased? Would a while loop be more appropriate, or perhaps use of an algorithm that guarantees the desired property?
Related: how often does the Newton step here not increase the function value in practice? If often, perhaps Newton's method is unsuitable here, and a Trust Region procedure could be considered?
EL.cpp line 181: is explicitly creating a projection matrix necessary? The combination of this being a dense matrix, and relying on explicitly calling inverse (which may have performance and stability implications), leads me to question whether operations involving this matrix could instead be coded more efficiently. For example, if we have P = I - X(X X^T)^-1 X^T, and you want to compute a matrix/vector product Pv, you could write Pv = v - Xb where (XX^T)b = X^Tv, and b is obtained by solving this system. This avoids the need to ever compute and store an explicit inverse.
What is the purpose of the PsuedoLog class? Commenting this code (lines 347--394) could help facilitate a more thorough review. This looks like a generic mathematical operation-- is there no pre-existing implementation to rely on?

Testing

The testing appears thorough, I have no comments.

Visualisation (where appropriate)

The visualizations provided seem to serve their stated purpose. No comments.

Package Design

External: the package is designed for the specific task of computing empirical likelihood ratio tests and intervals for the parameters of linear and generalized linear models. The package is set up to take input similar to lm and glm and then call the associated .fit function from core R. Some suggestions for broadening scope:

Could methods for objects of class lm and glm (and so on) be provided, so that the package provides its output to users who have already fit their models using these alternative methods, and hence compare them?
The output of el_lm is an object of class LM, a new class from the melt package. Could this output be made to inherit from the core R lm class, to use the associated functionality with which most R users are familiar? Or related, could the package provide a straightforward manner by which to fit an lm object from a fitted LM object? Again, thinking about a user who wishes to compare the two. Similar comments for glm.

Internal: see comments about algorithms above. In general the use of C++ for the numerical analysis seems appropriate, notwithstanding the specific comments above.

[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 8

[ ] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Aug 17 '22 21:08 awstringer1

software-review software-review copied to clipboard

Submission - melt: Multiple Empirical Likelihood Tests

Due date for @awstringer1: 2022-08-29 Archive: TBD Version accepted: TBD Language: en

Pre-submission Inquiry

General Information

Badging

Technical checks

Publication options

Code of conduct

Checks for melt (v1.6.0.9000)

1. rOpenSci Statistical Standards (srr package)

2. Package Dependencies

3. Statistical Properties

3a. Network visualisation

4. goodpractice and other checks

3a. Continuous Integration Badges

3b. goodpractice results

R CMD check with rcmdcheck

Test coverage with covr

Cyclocomplexity with cyclocomp

Static code analyses with lintr

Editor-in-Chief Instructions:

Package Review

Compliance with Standards

Comments

General Review

Documentation

Algorithms

Testing

Visualisation (where appropriate)

Package Design

software-review
software-review copied to clipboard

1. rOpenSci Statistical Standards (`srr` package)

4. `goodpractice` and other checks

3b. `goodpractice` results

`R CMD check` with rcmdcheck