software-review icon indicating copy to clipboard operation
software-review copied to clipboard

Presubmission Inquiry - ReliaGrowR

Open paulgovan opened this issue 3 months ago • 12 comments

Submitting Author Name: Paul Govan Submitting Author Github Handle: @paulgovan Other Package Authors Github handles: (comma separated, delete if none) Repository: https://github.com/paulgovan/ReliaGrowR Submission type: Pre-submission Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: ReliaGrowR
Title: Reliability Growth Analysis
Version: 0.2
Authors@R: person("Paul", "Govan", email = "[email protected]", 
  role = c("aut", "cre", "cph"), comment = c(ORCID = "0000-0002-1821-8492"))
Description: Modeling and plotting functions for Reliability Growth Analysis (RGA). Models include the Duane (1962) <doi:10.1109/TA.1964.4319640>, Non-Homogeneous Poisson Process (NHPP) by Crow (1975) <https://apps.dtic.mil/sti/citations/ADA020296>, Piecewise Weibull NHPP by Guo et al. (2010) <doi:10.1109/RAMS.2010.5448029>, and Piecewise Weibull NHPP with Change Point Detection based on the 'segmented' package by Muggeo (2024) <https://cran.r-project.org/package=segmented>.
Imports:
  stats,
  graphics,
  segmented
License: CC BY 4.0
Encoding: UTF-8
Roxygen: list (markdown = TRUE, roclets = c ("namespace", "rd", "srr::srr_stats_roclet"))
Suggests: 
    ellmer,
    knitr,
    rmarkdown,
    spelling,
    testthat (>= 3.0.0),
    vdiffr
Language: en-US
URL: https://paulgovan.github.io/ReliaGrowR/, https://github.com/paulgovan/ReliaGrowR
Config/testthat/edition: 3
VignetteBuilder: knitr
BugReports: https://github.com/paulgovan/ReliaGrowR/issues
RoxygenNote: 7.3.3
Depends: 
    R (>= 3.5)
LazyData: true

Scope

  • Please indicate which category or categories from our package fit policies or statistical package categories this package falls under. (Please check one or more appropriate boxes below):

    Data Lifecycle Packages

    • [ ] data retrieval
    • [ ] data extraction
    • [ ] data munging
    • [ ] data deposition
    • [ ] data validation and testing
    • [ ] workflow automation
    • [ ] version control
    • [ ] citation management and bibliometrics
    • [ ] scientific software wrappers
    • [ ] field and lab reproducibility tools
    • [ ] database software bindings
    • [ ] geospatial data
    • [ ] translation

    Statistical Packages

    • [ ] Bayesian and Monte Carlo Routines
    • [ ] Dimensionality Reduction, Clustering, and Unsupervised Learning
    • [ ] Machine Learning
    • [x] Regression and Supervised Learning
    • [ ] Exploratory Data Analysis (EDA) and Summary Statistics
    • [ ] Spatial Analyses
    • [ ] Time Series Analyses
    • [ ] Probability Distributions
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

  • ReliaGrowR provides classic reliability growth models, including the Duane, Crow-AMSAA, Piecewise NHPP, and Piecewise NHPP with Change Point Detection, fit using MLE and supported by visualization tools.

  • If submitting a statistical package, have you already incorporated documentation of standards into your code via the srr package?

  • Yes

  • Who is the target audience and what are scientific applications of this package?

  • The target audience includes reliability engineers, data analysts, researchers, and students interested in reliability growth analysis.

  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

  • To the best of my knowledge, no other R packages are specifically dedicated to reliability growth analysis (RGA). A review of CRAN and other repositories identified packages supporting NHPP modeling, but none that directly address RGA.

  • (If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

  • I do not believe this is applicable.

  • Any other questions or issues we should be aware of?:

paulgovan avatar Oct 13 '25 14:10 paulgovan

Thanks for your pre-submission to rOpenSci, our editors will reply soon.

ropensci-review-bot avatar Oct 13 '25 14:10 ropensci-review-bot

@ropensci-review-bot check srr

mpadge avatar Oct 13 '25 14:10 mpadge

'srr' standards compliance:

  • Complied with: 71 / 116 = 61.2% (general: 44 / 68; regression: 27 / 48)
  • Not complied with: 45 / 116 = 38.8% (general: 24 / 68; regression: 21 / 48)

:heavy_check_mark: This package complies with > 50% of all standards and may be submitted.

ropensci-review-bot avatar Oct 13 '25 14:10 ropensci-review-bot

@ropensci-review-bot check package

mpadge avatar Oct 13 '25 14:10 mpadge

Thanks, about to send the query.

ropensci-review-bot avatar Oct 13 '25 14:10 ropensci-review-bot

:rocket:

Editor check started

:wave:

ropensci-review-bot avatar Oct 13 '25 14:10 ropensci-review-bot

Checks for ReliaGrowR (v0.2)

git hash: ca872d5f

  • :heavy_check_mark: Package is already on CRAN.
  • :heavy_check_mark: has a 'codemeta.json' file.
  • :heavy_check_mark: has a 'contributing' file.
  • :heavy_check_mark: uses 'roxygen2'.
  • :heavy_check_mark: 'DESCRIPTION' has a URL field.
  • :heavy_check_mark: 'DESCRIPTION' has a BugReports field.
  • :heavy_check_mark: Package has at least one HTML vignette
  • :heavy_check_mark: All functions have examples.
  • :heavy_check_mark: Package has continuous integration checks.
  • :heavy_check_mark: Package coverage is 95.4%.
  • :heavy_check_mark: This is a statistical package which complies with all applicable standards
  • :heavy_check_mark: R CMD check found no errors.
  • :heavy_check_mark: R CMD check found no warnings.
  • :eyes: Some goodpractice linters failed.
  • :eyes: Function names are duplicated in other packages

(Checks marked with :eyes: may be optionally addressed.)

Package License: CC BY 4.0


1. rOpenSci Statistical Standards (srr package)

This package is in the following category:

  • Regression and Supervised Learning

:heavy_check_mark: All applicable standards [v0.2.0] have been documented in this package (506 complied with; 45 N/A standards)

Click to see the report of author-reported standards compliance of the package with links to associated lines of code, which can be re-generated locally by running the srr_report() function from within a local clone of the repository.


2. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 85
internal ReliaGrowR 6
internal utils 4
imports stats 26
imports segmented 8
imports graphics 3
suggests ellmer NA
suggests knitr NA
suggests rmarkdown NA
suggests spelling NA
suggests testthat NA
suggests vdiffr NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

beta (10), c (10), exp (10), log (9), list (6), if (5), cumsum (4), length (4), round (4), data.frame (3), as.numeric (2), ifelse (2), is.list (2), is.matrix (2), sort (2), summary (2), ceiling (1), col (1), is.null (1), labels (1), match.arg (1), merge (1), sum (1), suppressWarnings (1)

stats

predict (6), residuals (5), BIC (4), logLik (4), AIC (3), lm (2), aggregate (1), cor (1)

segmented

intercept (3), segmented (3), slope (2)

ReliaGrowR

duane (1), FUN (1), plot.duane (1), plot.rga (1), ppplot.rga (1), print.duane (1)

utils

data (4)

graphics

lines (2), abline (1)


3. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 8 files) and
  • 1 authors
  • 1 vignette
  • 1 internal data file
  • 3 imported packages
  • 11 exported functions (median 45 lines of code)
  • 19 non-exported functions in R (median 50 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 8 47.5
files_inst 5 97.4
files_vignettes 1 61.2
files_tests 8 82.7
loc_R 596 50.6
loc_inst 814 75.3
loc_vignettes 120 29.8
loc_tests 1153 85.0
num_vignettes 1 58.2
data_size_total 627 57.5
data_size_median 627 60.7
n_fns_r 30 39.4
n_fns_r_exported 11 48.6
n_fns_r_not_exported 19 38.3
n_fns_per_file_r 1 22.3
num_params_per_fn 3 29.2
loc_per_fn_r 49 89.2
loc_per_fn_r_exp 45 76.4
loc_per_fn_r_not_exp 50 90.0
rel_whitespace_R 16 48.6
rel_whitespace_inst 19 75.9
rel_whitespace_vignettes 46 39.0
rel_whitespace_tests 23 86.3
doclines_per_fn_exp 33 38.9
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 0 0.0 TRUE

3a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


4. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check.yaml

GitHub Workflow Results

id name conclusion sha run_number date
18451911757 pages build and deployment success ce9213 75 2025-10-13
18451863126 pkgcheck success ca872d 28 2025-10-13
18451863140 pkgdown.yaml success ca872d 74 2025-10-13
18451863129 R-CMD-check.yaml success ca872d 30 2025-10-13
18451863151 test-coverage.yaml success ca872d 30 2025-10-13

3b. goodpractice results

R CMD check with rcmdcheck

R CMD check generated the following check_fail:

  1. cyclocomp

Test coverage with covr

Package coverage: 95.37

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
rdt 59
rga 49
weibull_to_rga 47
duane 30
plot.duane 22
plot.rga 21

Static code analyses with lintr

lintr found no issues with this package!


5. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following function name is duplicated in other packages:

    • rdt from rankdifferencetest

Package Versions

package version
pkgstats 0.2.0.68
pkgcheck 0.1.2.233
srr 0.1.4.9

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

ropensci-review-bot avatar Oct 13 '25 15:10 ropensci-review-bot

Thanks @paulgovan for your submission, which definitely seems in scope. Before we proceed, however, a couple of notes which I'll notate to make further discussion easier:

  • MP1 I note that all of your examples, and indeed the hard-code notation with your actual code, suggests that you envision exclusive application to time series data. Do you think it would be advantageous for you to adapt the package to accept time-series claseed data as input? The tsbox pacakge would allow arbitrary choice of formats, but that to me would make more sense that effectively hand-coded temporal inputs like in your examples.

    The use of classed inputs would also allow a host of pre-processing data checks and procedures to be applied prior to your main calculations, importantly including imputation of missing values, and checking assumptions regarding regularity. I suspect complying with our standards for time series as well as your current compliance with Regression standards would greatly improve both the robustness and the flexibility of the package.

  • MP2 I wonder whether you might consider Standard G3.1 to be applicable to your package?

    G3.1 Statistical software which relies on covariance calculations should enable users to choose between different algorithms for calculating covariances, and should not rely solely on covariances from the stats::cov function.

    Many of your calculations implicitly rely on standard Pearson-type covariances through calls to stats::lm(), and could perhaps be improved through replacing those with methods using more robust covariance calculations like those listed under G3.1. Passing stats::lm() results directly to stats::logLik() seems okay in your case because your models are all univariate (time-only), but the whole pipeline of covariance assumptions may be worth thinking about there?

Note that a detailed answer to that question is likely domain-specific, and many domains may never have considered potential impacts of lack of robustness in covariance calculations. If this is the case for your package, then that's okay. But as with time-series extension above, I suspect that modification to accommodate more flexible assumptions regarding covariance structures is likely to significantly improve the package.

  • MP3 A more general comment is that I found no indication in the README about envisions areas of application, and so was confused by Reliability Growth Analysis, which is a term unfamiliar to me. I had to read the vignette to understand what the package was trying to do. It's important that your README should contain sufficient information for anybody to understand what your package does. And I was still a bit unsure in the vignette, and found myself mostly dependent on the actual code examples to understand what ReliaGrowR actually does. I think it would help everybody - and most importantly reviewers - for you to more clearly identify how your package is intended to be applied, and what problems it is intended to solve.

Finally, and this is just a suggestion which you should feel entirely free to ignore:

  • MP4 Your input checks and assertions are fabulous and very comprehensive. But they all rely on base-R expressions, which may be slower than some alternative approaches to input assertion? Again, I'm not sure of your envisioned or typical area of application, but for high-frequency usage, I personally lean towards using checkmate, as all assertions are direct C-calls, and often faster than alternatives. Benchmark if you like, but I suspect using checkmate would likely speed up your assertions. It can also make reading code easier, as you don't need to hard-code error messages, yet they still retain the full context like your current hard-coded ones.

I only make those comments in the hope that they'll help improve your package before we proceed to review. Once i had read enough to understand, I was impressed by the package, and the code looks great!

mpadge avatar Oct 18 '25 13:10 mpadge

Hi @mpadge, thanks for taking the time to go over the package. I appreciate the detailed notes and suggestions.

MP1: I’m not familiar with tsbox, but I’ll take some time to explore it. One consideration is that reliability data generally includes both a time component and a coupled failure component. In most use cases, I expect users to manage their data in a standard tabular format (e.g., data.frame, CSV file), so it’s not entirely clear whether handling time-series data separately from failure data would add much benefit. That said, my goal is definitely to make data entry as easy as possible. At the same time, I try to keep dependencies to a minimum for better portability (see also my response to MP4). A potential compromise could be to include some guidance on preparing or cleaning data prior to running an RGA.

MP2: That’s a great point — especially if the package is ever extended to handle covariates, which is less common in practice, but still a possible use case. I’ll take another look at G3.1 and consider options for including more robust covariance calculations.

MP3: Agreed. I originally wrote the README with the reliability community in mind, but I can see how that may be unclear to new users. I’ll plan to add a short introduction explaining what RGA is, along with more context in the introductory example.

MP4: I wasn’t familiar with checkmate, but I’ll definitely look into it. As I mentioned in MP1, I try to minimize dependencies to keep the package lightweight and portable, but I’m open to adding them when the performance or readability benefits are significant. The idea of faster assertions is appealing, so I’ll experiment with it.

paulgovan avatar Oct 20 '25 14:10 paulgovan

Hi @paulgovan . Because this package falls within our statistic packages, we need to find a statistic-specific editor to make the final call. We're working on that, but at the moment our stat editors are overbooked. I just wanted to make you aware that this might take a bit longer than usual (sorry about that!), but we are trying to get someone assigned.

ldecicco-USGS avatar Oct 21 '25 17:10 ldecicco-USGS

Thanks for the heads up @ldecicco-USGS. In the meantime, I hope to address @mpadge's previous comments.

paulgovan avatar Oct 21 '25 18:10 paulgovan

Hi @mpadge, I wanted to follow up on your earlier points:

MP1 I explored using tsbox, but I’m still not convinced it would add much benefit since the package doesn’t operate strictly on time-series data. That said, I’ve added a short vignette (link) that demonstrates some common data manipulation tasks. While some examples are specific to reliability data, others are more general to help users who may have limited experience with R.

MP2 I’ve updated the documentation for G3.1 to clarify that this standard would apply if the package is extended in the future to include models with covariates.

MP3 The README now includes a brief introduction to Reliability Growth Analysis (RGA) and a revised example with more context to clarify the intended application.

MP4 I agree that checkmate is a strong option. For now, I kept the base-R assertions since I value their explicitness, but I may revisit this in the future if performance becomes a bottleneck.

Thanks again for your feedback!

paulgovan avatar Oct 31 '25 16:10 paulgovan