software-review icon indicating copy to clipboard operation
software-review copied to clipboard

rixpress: Reproducible Analytical Pipelines with Nix

Open b-rodrigues opened this issue 6 months ago • 18 comments

Submitting Author Name: Bruno Rodrigues Submitting Author Github Handle: @b-rodrigues Repository: https://github.com/b-rodrigues/rixpress Version submitted: 0.2.0 Submission type: Standard Editor: @ldecicco-USGS Reviewers: TBD

Archive: TBD Version accepted: TBD Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: rixpress
Title: Build Reproducible Analytical Pipelines With Nix
Version: 0.2.0
Authors@R:
    person("Bruno", "Rodrigues", , "[email protected]", role = c("aut", "cre"))
Description: Streamlines the creation of reproducible analytical pipelines using
  `default.nix` expressions generated via `{rix}` for reproducibility. Define
  derivations in R or Python, chain them into a composition of pure functions
  and build the resulting pipeline using `Nix` as the underlying end-to-end build
  tool. Functions to plot a DAG representation of the pipeline are included,
  as well as functions to load and inspect intermediary results for interactive
  analysis. User experience heavily inspired by the `{targets}` package.
License: GPL (>= 3)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
URL: https://github.com/b-rodrigues/rixpress/, https://b-rodrigues.github.io/rixpress/
BugReports: https://github.com/b-rodrigues/rixpress/issues
Depends:
    R (>= 4.1.0)
Imports:
    igraph,
    jsonlite,
    processx
RoxygenNote: 7.3.2
Suggests:
    dplyr,
    ggdag,
    ggplot2,
    knitr,
    mockery,
    reticulate,
    rix,
    rmarkdown,
    testthat (>= 3.0.0),
    usethis,
    visNetwork
Config/testthat/edition: 3
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • [ ] data retrieval
    • [ ] data extraction
    • [ ] data munging
    • [ ] data deposition
    • [ ] data validation and testing
    • [x] workflow automation
    • [ ] version control
    • [ ] citation management and bibliometrics
    • [ ] scientific software wrappers
    • [ ] field and lab reproducibility tools
    • [ ] database software bindings
    • [ ] geospatial data
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

This package is intended to help users set up reproducible pipelines using the Nix programming language for enhanced reproducibility.

  • Who is the target audience and what are scientific applications of this package?

The target audience is anyone wanting to switch from "script-based workflows" to build automation. rixpress generates valid Nix expressions from simple R function to define reproducible pipelines, and is heavily inspired by {targets}. The main difference between {targets} and this package is that the "heavy lifting" is performed by Nix, and it works very closely with my previous packages called {rix} which allows data scientists to set up reproducible environments using Nix. Also, because the underlying engine is Nix, it is language-agnostic, and so it is possible to define steps that use Python. These steps written in Python are not executed with {reticulate}, but instead run in a dedicated Python environment. Data transfer between Python an R is facilitated with {reticulate} though.

The main inspiration of this packages is {targets} and in combination with {rix}, one could set up a pipeline in a reproducible environment as well.

Link to presubmission: https://github.com/ropensci/software-review/issues/699

@maurolepore

  • Explain reasons for any pkgcheck items which your package is unable to pass.

Because this package relies heavily on side effects, unit tests are quite cumbersome to write, so I set up this other repository: https://github.com/b-rodrigues/rixpress_demos which contains many example pipelines that run on each push to {rixpress}'s repository. Thanks to LLM's I was able to improve test coverage to 67% (see https://github.com/b-rodrigues/rixpress/actions/runs/14971090564)

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • [x] Do you intend for this package to go on CRAN?

  • [ ] Do you intend for this package to go on Bioconductor?

  • [ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • [ ] The package is novel and will be of interest to the broad readership of the journal.
  • [ ] The manuscript describing the package is no longer than 3000 words.
  • [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

  • [x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

b-rodrigues avatar May 12 '25 11:05 b-rodrigues

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot avatar May 12 '25 11:05 ropensci-review-bot

:rocket:

Editor check started

:wave:

ropensci-review-bot avatar May 12 '25 11:05 ropensci-review-bot

Checks for rixpress (v0.2.0)

git hash: dbdc68c8

  • :heavy_check_mark: Package name is available
  • :heavy_check_mark: has a 'codemeta.json' file.
  • :heavy_check_mark: has a 'contributing' file.
  • :heavy_multiplication_x: The following functions have no documented return values: [export_nix_archive, import_nix_archive, print.derivation, rxp_init]
  • :heavy_check_mark: uses 'roxygen2'.
  • :heavy_check_mark: 'DESCRIPTION' has a URL field.
  • :heavy_check_mark: 'DESCRIPTION' has a BugReports field.
  • :heavy_check_mark: Package has at least one HTML vignette
  • :heavy_multiplication_x: These functions do not have examples: [export_nix_archive, import_nix_archive, print.derivation, rxp_common_setup, rxp_file_common, rxp_inspect, rxp_list_logs, rxp_make, rxp_py_file, rxp_r_file].
  • :heavy_check_mark: Package has continuous integration checks.
  • :heavy_multiplication_x: Package coverage is 67% (should be at least 75%).
  • :heavy_check_mark: R CMD check found no errors.
  • :heavy_check_mark: R CMD check found no warnings.
  • :eyes: Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with :eyes: may be optionally addressed.)

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 328
internal rixpress 53
internal stats 8
internal graphics 6
internal utils 1
imports jsonlite 5
imports igraph 4
imports processx 3
suggests ggplot2 9
suggests ggdag 3
suggests dplyr NA
suggests knitr NA
suggests mockery NA
suggests reticulate NA
suggests rix NA
suggests rmarkdown NA
suggests testthat NA
suggests usethis NA
suggests visNetwork NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (31), sapply (24), sprintf (20), paste0 (15), file.path (14), c (12), deparse1 (12), substitute (12), grep (11), lapply (10), for (9), gsub (9), list.files (8), paste (8), readLines (8), length (7), file (6), data.frame (5), match (5), regmatches (5), unlist (5), args (4), character (4), grepl (4), basename (3), Filter (3), format (3), gregexpr (3), pretty (3), seq_along (3), strsplit (3), sub (3), subset (3), unique (3), vapply (3), any (2), append (2), if (2), lengths (2), setdiff (2), stdout (2), system2 (2), tryCatch (2), which (2), as.character (1), cat (1), col (1), deparse (1), dirname (1), do.call (1), drop (1), file.info (1), getwd (1), I (1), identity (1), is.list (1), is.null (1), names (1), Negate (1), nrow (1), numeric (1), readline (1), readRDS (1), Reduce (1), regexec (1), rep (1), return (1), round (1), source (1), stop (1), Sys.time (1), system.file (1), vector (1)

rixpress

cb (3), get_need_py (3), get_need_r (3), gen_flat_pipeline (2), gen_pipeline (2), generate_configurePhase (2), load_line (2), parse_nix_envs (2), parse_packages (2), parse_rpkgs_git (2), rxp_inspect (2), rxp_list_logs (2), rxp_read_load_setup (2), unnest_all_columns (2), add_import (1), adjust_import (1), adjust_py_packages (1), confirm (1), dag_for_ci (1), export_nix_archive (1), generate_dag (1), generate_libraries_from_nix (1), generate_libraries_script (1), generate_py_libraries_from_nix (1), generate_r_libraries_from_nix (1), generate_r_or_py_libraries_from_nix (1), get_nodes_edges (1), import_formatter_py (1), import_formatter_r (1), import_nix_archive (1), print.derivation (1), rixpress (1), rxp_common_setup (1), rxp_copy (1), rxp_file_common (1), rxp_ga (1)

ggplot2

aes (7), scale_fill_manual (1), scale_shape_manual (1)

stats

df (5), var (2), line (1)

graphics

lines (6)

jsonlite

write_json (3), fromJSON (1), read_json (1)

igraph

write_graph (2), graph_from_data_frame (1), V (1)

ggdag

geom_dag_node (2), as_tidy_dagitty (1)

processx

run (3)

utils

timestamp (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 12 files) and
  • 1 authors
  • 7 vignettes
  • no internal data file
  • 3 imported packages
  • 29 exported functions (median 26 lines of code)
  • 70 non-exported functions in R (median 30 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 12 63.4
files_vignettes 7 98.0
files_tests 11 88.5
loc_R 1926 81.4
loc_vignettes 1257 93.2
loc_tests 1119 85.4
num_vignettes 7 98.4 TRUE
n_fns_r 99 74.8
n_fns_r_exported 29 76.9
n_fns_r_not_exported 70 74.4
n_fns_per_file_r 5 68.3
num_params_per_fn 3 29.3
loc_per_fn_r 28 74.0
loc_per_fn_r_exp 26 57.3
loc_per_fn_r_not_exp 30 78.2
rel_whitespace_R 15 77.2
rel_whitespace_vignettes 25 91.4
rel_whitespace_tests 16 80.6
doclines_per_fn_exp 25 23.9
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 37 58.8

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

rhub.yaml

GitHub Workflow Results

id name conclusion sha run_number date
14971213883 anthophilic-walkingstick: linux, macos, macos-arm64, windows, ubuntu-next, ubuntu-release, gcc14 success dbdc68 367 2025-05-12
14968903732 crabby-dromaeosaur: linux, macos, macos-arm64, windows, ubuntu-next, ubuntu-release, gcc14 failure ebdacd 364 2025-05-12
14971161307 devtools-tests-via-r-nix success dbdc68 395 2025-05-12
14971138662 divinatory-neonredguppy: linux, macos, macos-arm64, windows, ubuntu-next, ubuntu-release, gcc14 success 1c903f 366 2025-05-12
14969011823 lousy-mice: linux, macos, macos-arm64, windows, ubuntu-next, ubuntu-release, gcc14 success 5d36a0 365 2025-05-12
14971227259 pages build and deployment success 6f3863 347 2025-05-12
14971161309 pkgdown.yaml success dbdc68 403 2025-05-12
14971161323 run-rhub-checks success dbdc68 370 2025-05-12
14968514429 skeletonlike-wombat: linux, macos, macos-arm64, windows, ubuntu-next, ubuntu-release, gcc14 failure 433cb9 363 2025-05-12
14971161311 Test coverage success dbdc68 143 2025-05-12
14971161310 Trigger Demo Actions success dbdc68 236 2025-05-12

3b. goodpractice results

R CMD check with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

Package coverage: 67.05

The following files are not completely covered by tests:

file coverage
R/generate_dag.R 58.33%
R/plot_dag.R 36.42%
R/rxp_copy.R 27.78%
R/rxp_ga.R 66.67%
R/rxp_make.R 0%
R/rxp_read_load.R 0%

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
gen_pipeline 33
generate_dag 25

Static code analyses with lintr

lintr found the following 148 potential issues:

message number of times
Avoid 1:nrow(...) expressions, use seq_len. 1
Avoid changing the working directory, or restore it in on.exit 11
Avoid library() and require() calls in packages 20
Avoid using sapply, consider vapply instead, that's type safe 24
Lines should not be more than 80 characters. This line is 101 characters. 1
Lines should not be more than 80 characters. This line is 102 characters. 1
Lines should not be more than 80 characters. This line is 104 characters. 1
Lines should not be more than 80 characters. This line is 105 characters. 2
Lines should not be more than 80 characters. This line is 106 characters. 1
Lines should not be more than 80 characters. This line is 107 characters. 1
Lines should not be more than 80 characters. This line is 109 characters. 1
Lines should not be more than 80 characters. This line is 113 characters. 4
Lines should not be more than 80 characters. This line is 117 characters. 1
Lines should not be more than 80 characters. This line is 125 characters. 2
Lines should not be more than 80 characters. This line is 138 characters. 2
Lines should not be more than 80 characters. This line is 159 characters. 1
Lines should not be more than 80 characters. This line is 169 characters. 1
Lines should not be more than 80 characters. This line is 171 characters. 2
Lines should not be more than 80 characters. This line is 173 characters. 2
Lines should not be more than 80 characters. This line is 174 characters. 1
Lines should not be more than 80 characters. This line is 193 characters. 1
Lines should not be more than 80 characters. This line is 197 characters. 3
Lines should not be more than 80 characters. This line is 203 characters. 1
Lines should not be more than 80 characters. This line is 205 characters. 1
Lines should not be more than 80 characters. This line is 281 characters. 1
Lines should not be more than 80 characters. This line is 310 characters. 2
Lines should not be more than 80 characters. This line is 357 characters. 1
Lines should not be more than 80 characters. This line is 362 characters. 1
Lines should not be more than 80 characters. This line is 373 characters. 1
Lines should not be more than 80 characters. This line is 380 characters. 1
Lines should not be more than 80 characters. This line is 399 characters. 1
Lines should not be more than 80 characters. This line is 415 characters. 1
Lines should not be more than 80 characters. This line is 426 characters. 1
Lines should not be more than 80 characters. This line is 429 characters. 1
Lines should not be more than 80 characters. This line is 450 characters. 1
Lines should not be more than 80 characters. This line is 482 characters. 1
Lines should not be more than 80 characters. This line is 526 characters. 1
Lines should not be more than 80 characters. This line is 597 characters. 1
Lines should not be more than 80 characters. This line is 81 characters. 4
Lines should not be more than 80 characters. This line is 82 characters. 5
Lines should not be more than 80 characters. This line is 83 characters. 8
Lines should not be more than 80 characters. This line is 84 characters. 1
Lines should not be more than 80 characters. This line is 85 characters. 5
Lines should not be more than 80 characters. This line is 86 characters. 4
Lines should not be more than 80 characters. This line is 87 characters. 1
Lines should not be more than 80 characters. This line is 88 characters. 3
Lines should not be more than 80 characters. This line is 92 characters. 7
Lines should not be more than 80 characters. This line is 93 characters. 2
Lines should not be more than 80 characters. This line is 94 characters. 1
Lines should not be more than 80 characters. This line is 95 characters. 1
Lines should not be more than 80 characters. This line is 96 characters. 2
Lines should not be more than 80 characters. This line is 97 characters. 1
unexpected end of input 1
unexpected symbol 1

4. Other Checks

Details of other checks (click to open)

:heavy_multiplication_x: The following function name is duplicated in other packages:

    • get_nodes_edges from malan

Package Versions

package version
pkgstats 0.2.0.54
pkgcheck 0.1.2.126

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

ropensci-review-bot avatar May 12 '25 12:05 ropensci-review-bot

Thanks @b-rodrigues, can you please address the three failing checks:

✖️ The following functions have no documented return values: [export_nix_archive, import_nix_archive, print.derivation, rxp_init] ✖️ These functions do not have examples: [export_nix_archive, import_nix_archive, print.derivation, rxp_common_setup, rxp_file_common, rxp_inspect, rxp_list_logs, rxp_make, rxp_py_file, rxp_r_file]. ✖️ Package coverage is 67% (should be at least 75%).

I also note that the function with a duplicated name is get_nodes_edges(), which is likely overly generic. I see you've prepended many functions with rxp_ - perhaps you could also do the same with that function? I also see you don't currently use our pkgcheck action. That might help to ensure everything is okay, or if you'd rather not, you can check locally, and then once you confirm all is ✔ , feel free to call @ropensci-review-bot check package. Thanks!

mpadge avatar May 12 '25 12:05 mpadge

Ok, so I've implemented the changes, but for the unit test coverage. As explained, the package relies a lot on side-effects, so increasing to 75% will be quite difficult, especially because the functions that are not tested are those that would required build artifacts in the Nix store. Mocking that would be pain in the bottom. As a compromise, I set up this repo: https://github.com/b-rodrigues/rixpress_demos with complete pipelines that test these functions.

Would this be ok?

I also note that the function with a duplicated name is get_nodes_edges(), which is likely overly generic.

This function was being exported by mistake, I don't export it anymore, so the clash shouldn't cause any issue.

b-rodrigues avatar May 12 '25 15:05 b-rodrigues

@ropensci-review-bot check package

b-rodrigues avatar May 12 '25 15:05 b-rodrigues

Thanks, about to send the query.

ropensci-review-bot avatar May 12 '25 15:05 ropensci-review-bot

:rocket:

Editor check started

:wave:

ropensci-review-bot avatar May 12 '25 15:05 ropensci-review-bot

Checks for rixpress (v0.2.0)

git hash: 8e396034

  • :heavy_check_mark: Package name is available
  • :heavy_check_mark: has a 'codemeta.json' file.
  • :heavy_check_mark: has a 'contributing' file.
  • :heavy_check_mark: uses 'roxygen2'.
  • :heavy_check_mark: 'DESCRIPTION' has a URL field.
  • :heavy_check_mark: 'DESCRIPTION' has a BugReports field.
  • :heavy_check_mark: Package has at least one HTML vignette
  • :heavy_check_mark: All functions have examples.
  • :heavy_check_mark: Package has continuous integration checks.
  • :heavy_multiplication_x: Package coverage is 67.5% (should be at least 75%).
  • :heavy_check_mark: R CMD check found no errors.
  • :heavy_check_mark: R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: GPL (>= 3)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 331
internal rixpress 52
internal stats 8
internal graphics 6
internal utils 1
imports jsonlite 5
imports igraph 4
imports processx 3
suggests ggplot2 9
suggests ggdag 3
suggests dplyr NA
suggests knitr NA
suggests mockery NA
suggests reticulate NA
suggests rix NA
suggests rmarkdown NA
suggests testthat NA
suggests usethis NA
suggests visNetwork NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (31), sprintf (20), paste0 (15), file.path (14), vapply (13), c (12), deparse1 (12), substitute (12), grep (11), sapply (11), lapply (10), character (9), for (9), gsub (9), list.files (8), paste (8), readLines (8), length (7), file (6), data.frame (5), match (5), regmatches (5), unlist (5), args (4), grepl (4), basename (3), Filter (3), format (3), gregexpr (3), pretty (3), seq_along (3), strsplit (3), sub (3), subset (3), unique (3), any (2), append (2), if (2), lengths (2), setdiff (2), stdout (2), system2 (2), tryCatch (2), which (2), as.character (1), cat (1), col (1), deparse (1), dirname (1), do.call (1), drop (1), file.info (1), getwd (1), I (1), identity (1), is.list (1), is.null (1), logical (1), names (1), Negate (1), nrow (1), numeric (1), readline (1), readRDS (1), Reduce (1), regexec (1), rep (1), return (1), round (1), source (1), stop (1), Sys.time (1), system.file (1), vector (1)

rixpress

cb (3), get_need_py (3), get_need_r (3), gen_flat_pipeline (2), gen_pipeline (2), generate_configurePhase (2), parse_nix_envs (2), parse_packages (2), parse_rpkgs_git (2), rxp_inspect (2), rxp_list_logs (2), rxp_read_load_setup (2), unnest_all_columns (2), add_import (1), adjust_import (1), adjust_py_packages (1), confirm (1), dag_for_ci (1), export_nix_archive (1), generate_dag (1), generate_libraries_from_nix (1), generate_libraries_script (1), generate_py_libraries_from_nix (1), generate_r_libraries_from_nix (1), generate_r_or_py_libraries_from_nix (1), get_nodes_edges (1), import_formatter_py (1), import_formatter_r (1), import_nix_archive (1), load_line (1), print.derivation (1), rixpress (1), rxp_common_setup (1), rxp_copy (1), rxp_file_common (1), rxp_ga (1)

ggplot2

aes (7), scale_fill_manual (1), scale_shape_manual (1)

stats

df (5), var (2), line (1)

graphics

lines (6)

jsonlite

write_json (3), fromJSON (1), read_json (1)

igraph

write_graph (2), graph_from_data_frame (1), V (1)

ggdag

geom_dag_node (2), as_tidy_dagitty (1)

processx

run (3)

utils

timestamp (1)


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 12 files) and
  • 1 authors
  • 7 vignettes
  • no internal data file
  • 3 imported packages
  • 29 exported functions (median 26 lines of code)
  • 70 non-exported functions in R (median 30 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 12 63.4
files_vignettes 7 98.0
files_tests 11 88.5
loc_R 1948 81.6
loc_vignettes 1257 93.2
loc_tests 1119 85.4
num_vignettes 7 98.4 TRUE
n_fns_r 99 74.8
n_fns_r_exported 29 76.9
n_fns_r_not_exported 70 74.4
n_fns_per_file_r 5 68.3
num_params_per_fn 3 29.3
loc_per_fn_r 28 74.0
loc_per_fn_r_exp 26 57.3
loc_per_fn_r_not_exp 30 78.2
rel_whitespace_R 15 77.2
rel_whitespace_vignettes 25 91.4
rel_whitespace_tests 16 80.6
doclines_per_fn_exp 29 31.4
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 37 58.8

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

rhub.yaml

GitHub Workflow Results

id name conclusion sha run_number date
14976041988 acidophilic-americancreamdraft: linux, macos, macos-arm64, windows, ubuntu-next, ubuntu-release, gcc14 success 0c0291 375 2025-05-12
14976311675 devtools-tests-via-r-nix success 8e3960 406 2025-05-12
14976376869 pages build and deployment success ced3fb 358 2025-05-12
14976311672 pkgcheck NA 8e3960 8 2025-05-12
14976311670 pkgdown.yaml success 8e3960 414 2025-05-12
14976311683 run-rhub-checks success 8e3960 381 2025-05-12
14976363240 serpentine-xoloitzcuintli: linux, macos, macos-arm64, windows, ubuntu-next, ubuntu-release, gcc14 NA 8e3960 378 2025-05-12
14976311680 Test coverage success 8e3960 154 2025-05-12
14976108488 timeconsuming-limpkin: linux, macos, macos-arm64, windows, ubuntu-next, ubuntu-release, gcc14 success 932bbf 376 2025-05-12
14976197952 transcendentalistic-lowchen: linux, macos, macos-arm64, windows, ubuntu-next, ubuntu-release, gcc14 success 932bbf 377 2025-05-12
14976311678 Trigger Demo Actions success 8e3960 247 2025-05-12

3b. goodpractice results

R CMD check with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

Package coverage: 67.5

The following files are not completely covered by tests:

file coverage
R/generate_dag.R 58.33%
R/plot_dag.R 36.42%
R/rxp_copy.R 27.78%
R/rxp_ga.R 66.67%
R/rxp_make.R 0%
R/rxp_read_load.R 0%

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
gen_pipeline 33
generate_dag 25

Static code analyses with lintr

lintr found the following 134 potential issues:

message number of times
Avoid 1:nrow(...) expressions, use seq_len. 1
Avoid changing the working directory, or restore it in on.exit 11
Avoid library() and require() calls in packages 20
Avoid using sapply, consider vapply instead, that's type safe 10
Lines should not be more than 80 characters. This line is 101 characters. 1
Lines should not be more than 80 characters. This line is 102 characters. 1
Lines should not be more than 80 characters. This line is 104 characters. 1
Lines should not be more than 80 characters. This line is 105 characters. 2
Lines should not be more than 80 characters. This line is 106 characters. 1
Lines should not be more than 80 characters. This line is 107 characters. 1
Lines should not be more than 80 characters. This line is 109 characters. 1
Lines should not be more than 80 characters. This line is 113 characters. 4
Lines should not be more than 80 characters. This line is 117 characters. 1
Lines should not be more than 80 characters. This line is 125 characters. 2
Lines should not be more than 80 characters. This line is 138 characters. 2
Lines should not be more than 80 characters. This line is 159 characters. 1
Lines should not be more than 80 characters. This line is 169 characters. 1
Lines should not be more than 80 characters. This line is 171 characters. 2
Lines should not be more than 80 characters. This line is 173 characters. 2
Lines should not be more than 80 characters. This line is 174 characters. 1
Lines should not be more than 80 characters. This line is 193 characters. 1
Lines should not be more than 80 characters. This line is 197 characters. 3
Lines should not be more than 80 characters. This line is 203 characters. 1
Lines should not be more than 80 characters. This line is 205 characters. 1
Lines should not be more than 80 characters. This line is 281 characters. 1
Lines should not be more than 80 characters. This line is 310 characters. 2
Lines should not be more than 80 characters. This line is 357 characters. 1
Lines should not be more than 80 characters. This line is 362 characters. 1
Lines should not be more than 80 characters. This line is 373 characters. 1
Lines should not be more than 80 characters. This line is 380 characters. 1
Lines should not be more than 80 characters. This line is 399 characters. 1
Lines should not be more than 80 characters. This line is 415 characters. 1
Lines should not be more than 80 characters. This line is 426 characters. 1
Lines should not be more than 80 characters. This line is 429 characters. 1
Lines should not be more than 80 characters. This line is 450 characters. 1
Lines should not be more than 80 characters. This line is 482 characters. 1
Lines should not be more than 80 characters. This line is 526 characters. 1
Lines should not be more than 80 characters. This line is 597 characters. 1
Lines should not be more than 80 characters. This line is 81 characters. 4
Lines should not be more than 80 characters. This line is 82 characters. 5
Lines should not be more than 80 characters. This line is 83 characters. 8
Lines should not be more than 80 characters. This line is 84 characters. 1
Lines should not be more than 80 characters. This line is 85 characters. 3
Lines should not be more than 80 characters. This line is 86 characters. 4
Lines should not be more than 80 characters. This line is 87 characters. 3
Lines should not be more than 80 characters. This line is 88 characters. 3
Lines should not be more than 80 characters. This line is 92 characters. 7
Lines should not be more than 80 characters. This line is 93 characters. 2
Lines should not be more than 80 characters. This line is 94 characters. 1
Lines should not be more than 80 characters. This line is 95 characters. 1
Lines should not be more than 80 characters. This line is 96 characters. 2
Lines should not be more than 80 characters. This line is 97 characters. 1
unexpected end of input 1
unexpected symbol 1

Package Versions

package version
pkgstats 0.2.0.54
pkgcheck 0.1.2.126

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

ropensci-review-bot avatar May 12 '25 15:05 ropensci-review-bot

Preliminay Editor checks:

  • [ ] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
    • [x] Is the case for the package well made?
    • [ ] Is the reference index page clear (grouped by topic if necessary)?
    • [x] Are vignettes readable, sufficiently detailed and not just perfunctory?
  • [x] Fit: The package meets criteria for fit and overlap.
  • [x] Installation instructions: Are installation instructions clear enough for human users?
  • [x] Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
  • [ ] Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
  • [x] License: The package has a CRAN or OSI accepted license.
  • [x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Thanks for your submission @b-rodrigues, which looks like a very useful extension of {rix}. I expect we'll proceed soon, but note first a couple of very minor issues from the checks above:

  • The package reference page has all functions together. Could you please structure the reference index by adding {roxygen2} @family tags, as described in this section of our Dev Guide?
  • Your extended checks repository in https://github.com/b-rodrigues/rixpress_demos is a great solution to testing, and definitely satisfactory for us. In order to satisfy the second missing item in the checklist above, could you please:
    • Add a bit more detail to your current CONTRIBUTING.md, especially including description of how {rixpress-demo} is used in tests; and
    • Explicitly reference CONTRIBUTING.md somewhere in your readme, with brief instructions on how to contribute.
    • Not necessary now, but good to keep in mind: Issue templates would provide a great way to ensure all who wanted to contribute were aware of {rixpress-demo}, and understood the relationship between the two repos.

Let us know when those points have been addressed, and we'll proceed from there. Thanks :+1:

mpadge avatar May 14 '25 08:05 mpadge

hi @mpadge thanks for your feedback! I've addressed your suggestions.

b-rodrigues avatar May 14 '25 18:05 b-rodrigues

@b-rodrigues Sorry for slight delay here, we're still trying to find and assign an editor to handle this. Should be assigned soon.

mpadge avatar May 20 '25 08:05 mpadge

no worries :)

b-rodrigues avatar May 20 '25 09:05 b-rodrigues

@ropensci-review-bot assign @ldecicco-USGS as editor

ldecicco-USGS avatar May 27 '25 13:05 ldecicco-USGS

Assigned! @ldecicco-USGS is now the editor

ropensci-review-bot avatar May 27 '25 13:05 ropensci-review-bot

Editor checks:

  • [x] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
    • [x] Is the case for the package well made?
    • [x] Is the reference index page clear (grouped by topic if necessary)?
    • [x] Are vignettes readable, sufficiently detailed and not just perfunctory?
  • [x] Fit: The package meets criteria for fit and overlap.
  • [x] Installation instructions: Are installation instructions clear enough for human users?
  • [x] Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
  • [x] Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
  • [x] License: The package has a CRAN or OSI accepted license.
  • [x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Looks great as usual.

ldecicco-USGS avatar Jun 10 '25 21:06 ldecicco-USGS

@ropensci-review-bot seeking reviewers

ldecicco-USGS avatar Jun 10 '25 21:06 ldecicco-USGS

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/706_status.svg)](https://github.com/ropensci/software-review/issues/706)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

ropensci-review-bot avatar Jun 10 '25 21:06 ropensci-review-bot

@ropensci-review-bot assign @wlandau as reviewer

ldecicco-USGS avatar Jul 07 '25 12:07 ldecicco-USGS

@wlandau added to the reviewers list. Review due date is 2025-07-28. Thanks @wlandau for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot avatar Jul 07 '25 12:07 ropensci-review-bot

@wlandau: If you haven't done so, please fill this form for us to update our reviewers records.

ropensci-review-bot avatar Jul 07 '25 12:07 ropensci-review-bot

@ropensci-review-bot assign @amart90 as reviewer

ldecicco-USGS avatar Jul 07 '25 14:07 ldecicco-USGS

@amart90 added to the reviewers list. Review due date is 2025-07-28. Thanks @amart90 for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

ropensci-review-bot avatar Jul 07 '25 14:07 ropensci-review-bot

@amart90: If you haven't done so, please fill this form for us to update our reviewers records.

ropensci-review-bot avatar Jul 07 '25 14:07 ropensci-review-bot

Package Review

  • Briefly describe any working relationship you have (had) with the package authors.

Bruno and I follow each other's work as members of the R community. We have not yet worked together directly on a project.

  • [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

As the author of targets, I took a careful look at the coi guidelines:

The potential editor or reviewer has a conflict of interest if:...The potential reviewer/editor has significantly contributed to a competitor project.

There is obvious overlap, but I would not say rixpress is a competitor. rixpress has a niche outside the scope of targets:

I checked with @ldecicco-USGS, who agreed.

Documentation

The package includes all the following forms of documentation:

  • [x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • [x] Installation instructions: for the development version of package and any non-standard dependencies in README
  • [x] Vignette(s): demonstrating major functionality that runs successfully locally
  • [x] Function Documentation: for all exported functions
  • [x] Examples: (that run successfully locally) for all exported functions
  • [x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • [x] Installation: Installation succeeds as documented.
  • [x] Functionality: Any functional claims of the software have been confirmed.
  • [x] Performance: Any performance claims of the software have been confirmed.
  • [x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • [x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 4

  • [x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Overview

rixpress is an excellent prospective addition to rOpenSci. It fills a valuable niche in reproducible computation, the engineering is fantastic, and the documentation is comprehensive. Because the quality is already so high, I did not need to spend much time checking package development minutia. I spent most of my review time on high-level issues and my own experience as a new user.

Scope of this review

I reviewed:

  • The documentation at https://b-rodrigues.github.io/rixpress/.
  • Examples basic_r, r_multi_envs, and r_py_xgboost from https://github.com/b-rodrigues/rixpress_demos.
  • The rixpress source code and test suite.

Scope of rixpress

Arguably the most essential but most difficult part of developing any tool is establishing a clear and crisp set of requirements. Explicit pre-specified boundaries help prevent scope creep and ensure a package's priorities succeed long-term. For pipeline tools, scope is even more essential and even more challenging than usual, both because of the many different opinions about what a pipeline tool should do, and because of the huge variety of pipelines users routinely create.

I developed targets as a highly opinionated tool with R-focused research-oriented scenarios in mind. This vision was somewhat implicit, and I did not have enough experience then to completely spell it out. I regularly hear from people who use it for cases I did not consider: simple ETL operations on big data, database query workflows, daily pipelines where historical runs matter, etc. Some users even approach targets as an Airflow-like tool rather than a Make-like one, and they are looking for a feature set closer to what maestro provides.

You might have the same experience with rixpress. For example, users who switch from targets to rixpress may ask you to support branching, alternative DSLs, interactive debugging, fancy progress monitoring, alternative storage options, computing on clusters, alternative DAG visualizations, etc.

For rixpress, I would like to understand what the package may cover in the future, and what it definitely will not support. I think a dedicated section on scope in the documentation (possibly linked from the issue templates) will help set expectations for users who request features, and it will help you maintain rixpress for years to come.

At this early stage, the main areas of focus seem to be as follows (please correct me if I am wrong):

  1. Bringing pipeline functionality from nix-store to R.
  2. Interactive read-only inspection: visualization, reading from the data store, and historical runs.
  3. R/Python/Julia interoperability.
  4. Portability: through Nix itself, and continuous integration.

(1) seems like a promising direction because it invests in the qualities that makes rixpress most unique.

An aside: if you intend to expand on (1), e.g. alternative store types, you might also consider writing a low-level Nix client to facilitate the implementation, kind of like gert for Git, cmdstanr for CmdStan, or paws for AWS. This might even help you maintain rix.

Visualization

DAG visualizations greatly improve the user experience, but they are also a Pandora's Box of scope creep. rixpress already supports 3 backends for graphs: visNetwork, ggdag, and GraphViz (DOT; for CI). And each one is a magnet for feature requests.

To simplify the visualization feature set, what about using mermaid.js instead of ggdag or GraphViz? Mermaid graphs are just text, and they are very easy to generate without any additional R packages. For CI, you could use https://github.com/AlexanderGrooff/mermaid-ascii, which I think would produce graphs that are more readable and visually appealing than GraphViz can render (e.g. https://github.com/b-rodrigues/rixpress_demos/actions/runs/16252270236/job/45883546684#step:9:11).

visNetwork might only be necessary if you expect enormous pipelines whose graphs can only be explored interactively. If you do decide to keep rixpress::rxp_visnetwork(), I suggest keeping the feature set simple and tightly scoped. (Maybe it would also be a good idea to disable physics to improve rendering performance for large graphs.) visNetwork is great at zooming in and out of graphs of pretty much any size, but from experience developing targets::tar_visnetwork(), I have found it does not excel at creating nice-looking polished graphs. I think mermaid.js is much better at feature-rich pretty graphs.

Workflow functions

From the examples, I see two patterns for setting up and running pipelines. In simple cases:

list(
  rxp_r(...)
) |>
  rixpress()

but for Python projects such as r_py_xgboost:

list(
  rxp_py(...)
) |>
  rixpress(build = FALSE)
  
adjust_import(...)

rxp_make()

I think it would be clearer and more consistent to create a separate function (maybe rxp_populate()) which runs the equivalent of rixpress(build = FALSE). Then, if rixpress() itself is still needed, it could serve as the equivalent of rxp_populate() + rxp_make().

add_import() and adjust_import() feel a bit awkward as separate steps. You might instead consider an interface like:

list(
  rxp_py(...)
) |>
rxp_populate(
  derivations,
  py_imports = c(
    numpy = "from numpy import array, loadtxt",
    xgboost = "from xgboost import XGBClassifier"
  )
)

Names for functions and classes

I have minor suggestions to make the names of functions more internally consistent:

  • export_nix_archive() => rxp_export_nix_archive()
  • import_nix_archive() => rxp_import_nix_archive()
  • generate_dag() => rxp_generate_dag() or rxp_write_dag() or rxp_save_dag()

(You may not need dag_for_ci(), add_import(), or adjust_import() if you agree with my suggestions from earlier.)

In addition, functions like rxp_r() produce an object of class "derivation". I suggest renaming it to something like "rxp_derivation" so it does not conflict with e.g. mathematical packages with their own kinds of "derivations".

Installation experience

I am new to Nix, rix, and rixpress, and I began by installing the toolchain from scratch on an M2 Macbook Pro with OS 15.5. This is my work computer, so it has more security restrictions than a regular personal computer.

I followed the rix setup guide for macOS, which was clear and comprehensive. The curl command successfully downloaded the Determinate Systems installer, but the installer itself failed. First I realized I needed to run it with sudo, but even that failed. Nix installed successfully when I navigated a browser https://docs.determinate.systems/, manually downloaded the installer, and double-clicked it to run it. Maybe consider updating the vignette to mention that the point-and-click route is possible?

Afterwards, I installed cachix and ran cachix use rstats-on-nix. library(rix) initially showed these warning messages, but rix::setup_cachix() silenced them. The next library(rix) gave me a warning about an incomplete final line in ~/.config/nix/nix.conf, which I solved by manually opening the text file and adding a line break. I suggest ensuring rix::setup_cachix() leaves a terminating newline character in ~/.config/nix/nix.conf.

Storage

I really like the build logs feature you describe in https://b-rodrigues.github.io/rixpress/articles/g-logs.html. Over multiple pipeline runs, however, storage may accumulate, especially because Nix uses content-addressable storage (by hash). It may help to describe in that vignette how users can leverage the garbage collection features of Nix to clear out the data that is no longer need.

Multi-line expressions

The following pipeline succeeds:

library(rixpress)
list(
  rxp_r(
    name = derivation,
    expr = 1 + 1
  )
) |>
  rixpress()

But a similar one fails:

library(rixpress)
list(
  rxp_r(
    name = derivation,
    expr = {
      message("Running derivation")
      1 + 1
    }
  )
) |>
  rixpress()

with the error message:

Error: unexpected numeric constant in "  derivation <- {     message('Running derivation')     1

Same if I add a semicolon after the message() statement. I expect this rxp_r() etc. do not support multi-line expressions. I would suggest either adding this support or requiring expressions to be pure function calls.

Testing

I recommend including skip_if_not_installed() statements in tests where Suggests: packages are used (such as mockery and reticulate). In addition, when I ran the tests locally, one test threw a warning:

Warning (test-generate_libraries_from_nix.R:42:3): generate_py_libraries_from_nix: generate Py script by parsing default.nix
Python packages have been requested, but 'reticulate' is not in your list of R packages. If you want to handle Python objects from your R session, consider adding 'reticulate' to the list of R packages.
Backtrace
    ▆
 1. └─rix::rix(...) at test-generate_libraries_from_nix.R:42:3

Adding r_pkgs = reticulate to rix::rix() in https://github.com/b-rodrigues/rixpress/blob/ea052f4a024bb47705bf186541380d5febac279e/tests/testthat/test-generate_libraries_from_nix.R#L42 should remove it.

Test coverage from covr is lower than I normally see in packages, but I really like your approach to offload to https://github.com/b-rodrigues/rixpress_demos. If the number of projects in that repo grows unmanageable at some point, you might consider creating a new GitHub org for them like https://github.com/nf-core does for Nextflow.

Checks

When I ran devtools::check() locally, I saw: the note:

✔  checking for non-standard things in the check directory
N  checking for detritus in the temp directory
   Found the following files/directories:
     ‘RtmptcpOyn_repo_hash_url_jnlhe’

I have been flagged for this before when trying to submit packages to CRAN.

Lints

On my local machine, devtools::lint() shows many lints, including:

data-raw/gen_pipeline.R:6:3: style: [quotes_linter] Only use double-quotes.
  'mtcars.csv',
  ^~~~~~~~~~~~
data-raw/gen_pipeline.R:43:2: style: [commented_code_linter] Remove commented code.
#rxp_make()
 ^~~~~~~~~~
data-raw/jl_example/functions.R:1:34: style: [brace_linter] There should be a space before an opening curly brace.
prepare_data <- function(laplace){
                                 ^

I have never used the Air formatter, and I do see you have https://github.com/b-rodrigues/rixpress/blob/main/.github/workflows/style-with-air.yaml, so please disregard if there is an inherent conflict between Air and lintr.

devtools::spell_check() has many findings, including:

> devtools::spell_check()
DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'.
  WORD                FOUND IN
’s                README.md:18
al                  a-intro-concepts.Rmd:124
Analysing           d-polyglot.Rmd:34
autoplay            b-core-functions.Rmd:272
buildInputs         make_derivation_snippet.Rd:20
cachix              d-polyglot.Rmd:46
Cachix              d-polyglot.Rmd:62
cancelled           rxp_init.Rd:17
ci                  dag_for_ci.Rd:36,40
                    generate_dag.Rd:29,33
                    rxp_ga.Rd:29,33
cmdstanr            f-cmdstanr.Rmd:2
configurePhase      make_derivation_snippet.Rd:20
cryptographic       a-intro-concepts.Rmd:164,171,214,230
CTRL                c-tutorial.Rmd:105
                    d-polyglot.Rmd:107
                    d2-polyglot-julia.Rmd:82
deriv               rixpress.Rd:18
...

You can exclude specific false positives in inst/WORDLIST.

On my machine, urlchecker::url_check() shows:

✖ Error: vignettes/a-intro-concepts.Rmd:32:21 
403: Forbidden
spectrum/continuum](https://www.researchgate.net/figure/Reproducibility-spectrum-as-Peng-2011-stated_fig1_354765302),
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This could be because of the extra network security in my work environment, or it could be because ResearchGate has checks for bots. In the latter case, CRAN might flag the URL.

Miscellaneous suggestions

  • In rxp_copy(), I see Sys.chmod(all_files, mode = "777"), which could be risky on shared file systems. Is there a more restricted permission set that would still work?
  • For rxp_r_file(), the implicit roxygen2 @title tag is "rxp_r_file". I suggest a more descriptive name. Same for rxp_py_file().
  • Please consider changing the name of the default branch from "master" to "main" in https://github.com/b-rodrigues/rixpress_demos.
  • nix-store --realize has many options for --verbose. I suggest making the verbose argument of rxp_make() an integer to support this existing functionality. There are many more features you could consider for helping users monitor pipelines, some of which are more feasible than others, and this one seems like the lowest-hanging fruit.
  • In rixpress_demos/r_multi_envs, I recommend a more formal/safe choice for the meme image.
  • Instead of prefixes to control the order vignettes are listed, you could consider relying on pkgdown yaml for this, e.g. https://github.com/wlandau/crew/blob/728c45536d58faf1794e2a16c469fdce4a815176/_pkgdown.yml#L6-L19.

wlandau avatar Jul 21 '25 18:07 wlandau

Many thanks @wlandau for your review!

I'm currently on holidays without access to a computer so I'll only be able to address your comments in 2 weeks time. Just wanted to let you know 😁

b-rodrigues avatar Jul 23 '25 08:07 b-rodrigues

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
    • While I have followed the author's work and read his book, I have not worked with the package author.
  • [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • [x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • [x] Installation instructions: for the development version of package and any non-standard dependencies in README
    • There are instructions for installing rixpress in the README. This package is somewhat unique in that, while its intallation is straghtforward, to use it as intended a fairly involved installation process must be completed. There is a link to these instructions which included as a part of the rix package.
  • [x] Vignette(s): demonstrating major functionality that runs successfully locally
  • [x] Function Documentation: for all exported functions
  • [x] Examples: (that run successfully locally) for all exported functions
  • [x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • [x] Installation: Installation succeeds as documented.
    • While I was able to sucessfully install rixpress on my native OS (Windows) and within the Nix shell, I was unsuccessful getting some aspects of the rix installation completed, including IDE integration. Because of the security requirements of my work computer, I think this is not a rix problem and my own IT issue. I wanted to include that information to provide context for what I reviewed; however, I think it is out of the scope of the review of rixpress.
  • [x] Functionality: Any functional claims of the software have been confirmed.
  • [x] Performance: Any performance claims of the software have been confirmed.
  • [x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • [x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 14

  • [x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

I, similar to Will, am familiar with the quality of your work. I did a relatively quick look through the source code but I focused my effort on the usability, particularly as a user that is new to rixpress, rix, and Nix. I will try to ensure I don't have comments that overlap with Will's.

I had some difficulty getting rix installed through WSL on my machine and I was never able to sucessfully use an IDE (either through a native installation or a Nix-managed installation). These are likely issues that are due, at least partially, to the enhanced security requirements of my work computer. I won't go into any further details about this here, however, because they are related to Nix, WSL, and other things outside of the scope of this review.

General impressions

While I am a regular user of reproducible pipelines, my expirience comes as a user of targets. rixpress is different, not just in the implementation, but in the paradigm as well. While targets offers some gains in reproducibility (primarily in the isolation of runtime environments) rixpress offers several additional layers of reproducibility through the tracking of not just the input files and code, but the full environment: R and R packages, system-level software dependencies, environmental variables, etc. Additionally, it supports a truly polyglot pipeline that isn't reliant on reticulate to execute Python code, which can be brittle. This is sure to be an important contribution to the R community.

This package is conceptualized and implemented thoughtfully. As usual, your documentation is thorough and well written, which allows for the success of this package despite the complex concepts and details required to implement this. You have done a good job tucking away the heavy technical details from users who want this to "just work," while providing the details in the many vignettes for advanced users who want or need this information.

However, this does create a lot of opportunities for users to get lost when considering "multiple points of entry." There are many cases where information that would be helpful for the user to know is described in a vignette, but not in the function documentation. While I understand that a balance must be struck between completeness of documentation and brevity, there are opportunities for improving the completeness of the documentation, even if it is at the cost of repeating information. I include some examples in the Documentation section below, but a thoughtful review of all function documentation would greatly benefit the ability for a new user to begin using rixpress.

Packaging

  • I built rixpress locally and did not recieve any errors, warnings or notes from R CMD check.
  • All expected components of the package are present.

Code and testing

  • I appreciate the automatic code formatting with Air. It makes scanning the code predicatable and easy.
  • I executed all the tests I could on my native system (without having to execute them in a Nix environment) and they all passed.
  • While test coverage is reported as being low (covr::package_coverage() reports 46.64% coverage overall), I found testing to be reasonably comprehensive. I don't know all the specifics for how covr works, but there must be some disconnect with the way your tests are written and the way it assesses coverage. For example, it evaluated R/generate_dag.R as having 0.0% test coverage. However, I found tests/testthat/test-generate_dag.R to reasonable cover generate_dag(). One can always add more tests (and even superfical tests to inflate test coverage), but the current coverage seems appropriate to me despite low coverage reports.
  • testthat::skip_on_cran() is used on the test "rxp_init creates expected files" and I am not sure that this needs to be skipped. There are other tests that create files (e.g., test-rxp_copy.R) where files are created but the test can be completed on CRAN. Evaluate if this is necessary.

User interface

  • I had a similar suggestion to Will about the rixpress() function. It doesn't follow the same pattern used throughout most of the function names where it has a rxp_ prefix. At the very least, I think it could benefit from a more intuitive name that begins with rxp_ and is followed by a verb (as discussed in Tidy design princiles). Even better might be removing rixpress altogether and having a function that builds the pipeline plan and another to execute the pipeline. This is largely what Will said, but I include it because I think it is an important improvement to the interface.
  • I found error messages to be helpful and adequately descriptive in my testing.

Documentation

  • There is a missing word in the desccription field in the documentation for rixpress(). I think the word built needs to be added to the sentence "By default, the pipeline is also immediately built after being generated...".
  • The documentation for rxp_r/py_file() includes three methods to read data in. In the Examples, you demonstate methods 1 and 3. It might also be worthwhile to include method 2. While there has to be a balance between completeness and brevity in the Examples, I think it is a worthwhile addition.
  • In vignettes/b-core-functions.Rmd, I think the third paragraph would make more sense it it started with \read_function` requires an R function with a single argument...sincerxp_r_file()` has three required arguments and only one of which requires an R function.
  • At the end of the Generating the pipeline section of b-core-functions.Rmd, you provide a bulleted list of actions that rixpress() performs. I found this quite helpful and I think the function documentation for rixpress() would benefit form these bullets (or something similarly concise and explicit).
  • In Vignette C: Tutorual, in regards to rxp_inspect(), you mention "... and an object you didn’t define called all-derivations. This last object is mostly for internal rixpress use, and you can safely ignore it." I found this helpful context that I wish I had seen earlier in my use of rxp_inspect(). Consider adding a note about all-derivations in the function documentation for rxp_inspect().
  • In Vignette D: Polyglot pipelines you give an example where seralize and unserialize functions are passed as characters rather than expressions for both rxp_py() and rxp_r(). I think Vignette F: Cmdstandr suggests that custom functions defined in functions.R should be passed as characters, but in either case, it in not clear to me when a character should be passed rather than a function for rxp_r(). It would be helpful to update the documentation for rxp_r() to describe all the acceptable input types and when they should be used.
  • In Vignette D: Polyglot pipelines, you mention "In the future, other languages could be added to rixpress, notably Julia." However, it appears that Julia is already supported. Consider clarifying this.
  • Vignete G: cached artifacts is helpful. Consider linking to in in the function documentation for

Functionality

  • I am curious about the case where there are there are multiple different inputs to an rxp_r/py/jl() call that need different unserialization functions. For example, if I have rxp_r(out_df, custom_fn(model = keras_model, data = data_frame)) where keras_model should be unserialized with keras::load_model_hdf5() and data_frame should be unserialized with readRDS. Can the unserialize_function take multiple functions? If so, it might be helpful to make that explicit in the documentation. If not, is there a plan to support derications that have multiple inputs that require different unserialization functions? I don't want to contrive a bunch of edge cases, but to me this seems like it would be somewhat common (at least in my workflows).
  • Similarly, I tried to use a custom function defined in "functions.R" (with additional_files = "functions.R") specified) as a named function in serialization_function, the custom function was not found. It might be nice to be able to write cusome (un)serialization function. If that is not simple to implement, it is probably worth making it explicit in the domumentation that this must be a namespace function or anonymous function.
  • It seems likely to me that after many repeated builds of the same pipeline, especially one that requires large datasets or many intermediate derivations, storage space could become an issue with all previous artifacts stored (as described in Vignete G: cached artifacts). Now, to clear the store the guidence is to call nix-store --gc in the terminal. It might be nice to provide an R wrapper for this. And beter yet, provide a bit more control besides just clearing all build artifacts; for example, giving the user tehe option to delete stored artifacts from before a given date. This is a soft suggestion.

amart90 avatar Jul 23 '25 20:07 amart90

:calendar: @wlandau you have 2 days left before the due date for your review (2025-07-28).

ropensci-review-bot avatar Jul 26 '25 12:07 ropensci-review-bot

:calendar: @amart90 you have 2 days left before the due date for your review (2025-07-28).

ropensci-review-bot avatar Jul 26 '25 14:07 ropensci-review-bot

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/706#issuecomment-3109964972 time 6

ldecicco-USGS avatar Jul 28 '25 13:07 ldecicco-USGS