Contributions icon indicating copy to clipboard operation
Contributions copied to clipboard

Submission of scDiagnostics to Bioconductor

Open AnthonyChristidis opened this issue 1 year ago • 23 comments
trafficstars

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

  • Repository: https://github.com/ccb-hms/scDiagnostics

Confirm the following by editing each check box to '[x]'

  • [x ] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.

  • [x ] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.

  • [x ] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.

  • [x ] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.

  • [x ] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.

  • [x ] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.

  • [x ] I am familiar with the Bioconductor code of conduct and agree to abide by it.

I am familiar with the essential aspects of Bioconductor software management, including:

  • [ x] The 'devel' branch for new packages and features.
  • [ x] The stable 'release' branch, made available every six months, for bug fixes.
  • [x ] Bioconductor version control using Git (optionally via GitHub).

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

AnthonyChristidis avatar Jun 04 '24 03:06 AnthonyChristidis

Hi @AnthonyChristidis

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Type: Package
Package: scDiagnostics
Title: Cell type annotation diagnostics
Version: 0.99.0
Authors@R: c(
    person("Anthony", "Christidis", role = c("aut", "cre"), 
  email = "[email protected]"),
    person("Andrew", "Ghazi", role = "aut"),
    person("Smriti", "Chawla", role = "aut"),
    person("Nitesh", "Turaga", role = "ctb"),
    person("Ludwig", "Geistlinger", role = "aut"),
    person("Robert", "Gentleman", role = "aut")
  )
Description: The scDiagnostics package provides diagnostic plots to
    assess the quality of cell type assignments from single cell gene
    expression profiles. The implemented functionality allows to
    assess the reliability of cell type annotations, investigate gene
    expression patterns, and explore relationships between different
    cell types in query and reference datasets allowing users to
    detect potential misalignments between reference and query
    datasets. The package also provides visualization capabilities for
    diagnositics purposes.    
License: Artistic-2.0
URL: https://github.com/ccb-hms/scDiagnostics
BugReports: https://github.com/ccb-hms/scDiagnostics/issues
Depends:
    R (>= 4.4.0)
Imports:
    SingleCellExperiment,
    isotree,
    methods,
    ggplot2,
    RColorBrewer,
    gridExtra,
    SummarizedExperiment,
    stats,
    utils,
    ranger,
    BiocNeighbors,
    Hotelling,
    rlang
Suggests:
    AUCell,
    BiocStyle,
    corrplot,
    knitr,
    Matrix,
    rmarkdown,
    scran,
    scRNAseq,
    SingleR,
    celldex,
    ComplexHeatmap,
    scuttle,
    scater,
    testthat (>= 3.0.0)
VignetteBuilder: 
    knitr
biocViews:
    Annotation,
    Classification,
    Clustering,
    GeneExpression,
    RNASeq,
    SingleCell,
    Software,
    Transcriptomics
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
Config/testthat/edition: 3

bioc-issue-bot avatar Jun 04 '24 03:06 bioc-issue-bot

Hi we would like to request that this submission is assigned to @HelenaLC as reviewer, if possible, given that she is an expert in single-cell analysis and has written diagnostic functionality for CyTOF data that seems highly relevant here.

lgeistlinger avatar Jun 04 '24 09:06 lgeistlinger

Also: this package has been accepted for a package demo at the upcoming Bioc2024 conference. We understand that there are several constraints to the review process, but it would be great if we would be able to proceed with the review process in a timely manner so that the package would be available from Bioconductor in time for the package demo at the conference.

lgeistlinger avatar Jun 04 '24 10:06 lgeistlinger

I'm getting an ERROR trying to R CMD build ...

 4: do.call(FUN, c(list(x = x, k = k, nu = nu, nv = nv, center = center,     scale = scale, BPPARAM = BPPARAM, ...), ARGS(BSPARAM)))
 5: runSVD(x, k = k, nu = nu, nv = nv, center = center, scale = scale,     BPPARAM = BPPARAM, ..., BSPARAM = BSPARAMFUN(deferred = bsdeferred(BSPARAM),         fold = bsfold(BSPARAM)))
 6: runSVD(x, k = k, nu = nu, nv = nv, center = center, scale = scale,     BPPARAM = BPPARAM, ..., BSPARAM = BSPARAMFUN(deferred = bsdeferred(BSPARAM),         fold = bsfold(BSPARAM)))
 7: runSVD(x, k = rank, nu = ifelse(get.pcs, rank, 0), nv = ifelse(get.rotation,     rank, 0), center = center, scale = scale, ...)
 8: runSVD(x, k = rank, nu = ifelse(get.pcs, rank, 0), nv = ifelse(get.rotation,     rank, 0), center = center, scale = scale, ...)
 9: .local(x, ...)
10: runPCA(x, rank = ncomponents, BSPARAM = BSPARAM, BPPARAM = BPPARAM)
11: runPCA(x, rank = ncomponents, BSPARAM = BSPARAM, BPPARAM = BPPARAM)
12: .calculate_pca(mat, transposed = !is.null(dimred), ...)
13: .local(x, ...)
14: calculatePCA(y, ...)
15: calculatePCA(y, ...)
16: .local(x, ...)
17: runPCA(ref_data)
18: runPCA(ref_data)
19: eval(expr, envir, enclos)
20: eval(expr, envir, enclos)
21: eval_with_user_handlers(expr, envir, enclos, user_handlers)
22: withVisible(eval_with_user_handlers(expr, envir, enclos, user_handlers))
23: withCallingHandlers(withVisible(eval_with_user_handlers(expr,     envir, enclos, user_handlers)), warning = wHandler, error = eHandler,     message = mHandler)
24: handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr,     envir, enclos, user_handlers)), warning = wHandler, error = eHandler,     message = mHandler))
25: timing_fn(handle(ev <- withCallingHandlers(withVisible(eval_with_user_handlers(expr,     envir, enclos, user_handlers)), warning = wHandler, error = eHandler,     message = mHandler)))
26: evaluate_call(expr, parsed$src[[i]], envir = envir, enclos = enclos,     debug = debug, last = i == length(out), use_try = stop_on_error !=         2L, keep_warning = keep_warning, keep_message = keep_message,     log_echo = log_echo, log_warning = log_warning, output_handler = output_handler,     include_timing = include_timing)
27: evaluate::evaluate(...)
28: evaluate(code, envir = env, new_device = FALSE, keep_warning = if (is.numeric(options$warning)) TRUE else options$warning,     keep_message = if (is.numeric(options$message)) TRUE else options$message,     stop_on_error = if (is.numeric(options$error)) options$error else {        if (options$error && options$include)             0L        else 2L    }, output_handler = knit_handlers(options$render, options))
29: in_dir(input_dir(), expr)
30: in_input_dir(evaluate(code, envir = env, new_device = FALSE,     keep_warning = if (is.numeric(options$warning)) TRUE else options$warning,     keep_message = if (is.numeric(options$message)) TRUE else options$message,     stop_on_error = if (is.numeric(options$error)) options$error else {        if (options$error && options$include)             0L        else 2L    }, output_handler = knit_handlers(options$render, options)))
31: eng_r(options)
32: block_exec(params)
33: call_block(x)
34: process_group(group)
35: withCallingHandlers(if (tangle) process_tangle(group) else process_group(group),     error = function(e) if (xfun::pkg_available("rlang", "1.0.0")) rlang::entrace(e))
36: xfun:::handle_error(withCallingHandlers(if (tangle) process_tangle(group) else process_group(group),     error = function(e) if (xfun::pkg_available("rlang", "1.0.0")) rlang::entrace(e)),     function(e, loc) {        setwd(wd)        write_utf8(res, output %n% stdout())        paste0("\nQuitting from lines ", loc)    }, if (labels[i] != "") sprintf(" [%s]", labels[i]), get_loc)
37: process_file(text, output)
38: knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
39: rmarkdown::render(file, encoding = encoding, quiet = quiet, envir = globalenv(),     output_dir = getwd(), ...)
40: vweave_rmarkdown(...)
41: engine$weave(file, quiet = quiet, encoding = enc)
42: doTryCatch(return(expr), name, parentenv, handler)
43: tryCatchOne(expr, names, parentenv, handlers[[1L]])
44: tryCatchList(expr, classes, parentenv, handlers)
45: tryCatch({    engine$weave(file, quiet = quiet, encoding = enc)    setwd(startdir)    output <- find_vignette_product(name, by = "weave", engine = engine)    if (!have.makefile && vignette_is_tex(output)) {        texi2pdf(file = output, clean = FALSE, quiet = quiet)        output <- find_vignette_product(name, by = "texi2pdf",             engine = engine)    }    outputs <- c(outputs, output)}, error = function(e) {    thisOK <<- FALSE    fails <<- c(fails, file)    message(gettextf("Error: processing vignette '%s' failed with diagnostics:\n%s",         file, conditionMessage(e)))})
46: tools::buildVignettes(dir = ".", tangle = TRUE)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault (core dumped)

lshep avatar Jun 07 '24 14:06 lshep

I was able to build with R 4.4, Bioc 3.19. Will try with 3.20 container.

vjcitn avatar Jun 12 '24 02:06 vjcitn

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot avatar Jun 14 '24 13:06 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): scDiagnostics_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/scDiagnostics to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Jun 14 '24 14:06 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: c812f48e7a068bfc6d0ac6653578cf18d4d7d4ff

bioc-issue-bot avatar Jun 22 '24 23:06 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: scDiagnostics_0.99.1.tar.gz Linux (Ubuntu 22.04.3 LTS): scDiagnostics_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/scDiagnostics to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Jun 22 '24 23:06 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: 2455219c38148c30adae28bdf11ab2982d08a5b4

bioc-issue-bot avatar Jun 23 '24 00:06 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: scDiagnostics_0.99.2.tar.gz Linux (Ubuntu 22.04.3 LTS): scDiagnostics_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/scDiagnostics to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Jun 23 '24 00:06 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: e4ddfd6b9de05731c56f89479292851fbbabd45d

bioc-issue-bot avatar Jun 23 '24 02:06 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): scDiagnostics_0.99.3.tar.gz macOS 12.7.1 Monterey: scDiagnostics_0.99.3.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/scDiagnostics to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Jun 23 '24 02:06 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: 81ce49044abd984190b8eb5381e3734f3fff55e1

bioc-issue-bot avatar Jun 23 '24 03:06 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): scDiagnostics_0.99.4.tar.gz macOS 12.7.1 Monterey: scDiagnostics_0.99.4.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/scDiagnostics to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Jun 23 '24 03:06 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: 5280d5240b366d6a5174f506e951cd7c0200c3d9

bioc-issue-bot avatar Jun 23 '24 19:06 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): scDiagnostics_0.99.5.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/scDiagnostics to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Jun 23 '24 19:06 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: 516db8afee63ef5a91fabf4e22dc7987d2626bc0

bioc-issue-bot avatar Jun 23 '24 19:06 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: scDiagnostics_0.99.6.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/scDiagnostics to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Jun 23 '24 20:06 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: scDiagnostics_0.99.6.tar.gz Linux (Ubuntu 22.04.3 LTS): scDiagnostics_0.99.6.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/scDiagnostics to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Jun 24 '24 17:06 bioc-issue-bot

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

bioc-issue-bot avatar Jun 25 '24 13:06 bioc-issue-bot

Hi there, glad to see this coming through; here my first impressions. While these are mostly stylistic comments and should be simple enough to address, I feel the "crucial" bit is simplifying examples (see below). I think these should let the user have a quick peak at how to execute the function, what it returns etc., but they currently take quite some time to execute, and require installing most "suggested" packages... the vignette can be used for a more lengthly (biologically meaningful) demo, which it does nicely already. Please report back with what has/not been address how/why not, and any questions you might have; thanks!

misc

  • [ ] please consider implementing unit test; we strongly encourage them!

  • [ ] consider adding a package help page (accessible via ?scDiagnostics) with an overview of key functionality and links to main functions

  • [ ] there are some inconsistencies in vertical code alignment vs. 4-space tab indentation; this is generally fine, however, to ease readability, it'd be nice to stick to an 80-character limit in examples especially, specifically when variable & function names are long (e.g., the calculateAveragePairwiseCorrelation example doesn't show with my standard RStudio panel sizing)

  • [ ] it would be nice to start out with a clean-ish CHECK/BiocCheck report (tends to only get worse over time...); in most cases, R CMD CHECK NOTES stating "no visible binding" can be addressed by using aes(.data$x, ...) (instead of aes(x)) and adding @importFrom ggplot2 .data

code

  • [ ] many examples have quite long runtimes (>5s & up to 1min), largely attributable to loading the data (from online) and some heavy-ish computations; perhaps worth considering adding example data under inst/extdata, alongside a piece of documentation and a script under inst/scripts that recapitulates how that was retrieved/subsetted/processed? I personally find it quite tedious to have to download something and wait many seconds to see how a function works/what it returns (unless this is really necessary)... e.g., stashing a reference & query scRNA-seq data (subset) and corresponding SingleR results would suffice for most examples, I think? And then, perhaps, only showing the full workflow (normalization, transfer, selection, reduction, ...) for selected examples but not, say, every visualization? Importantly, this would also help to not require users to install various Suggests: to run examples (c.f. scuttle/scater examples which rely on toy data mostly)

  • [ ] avoid 1:...; use seq_len/along() instead

  • [ ] not sure where all this applies, but when matching an argument against a predefined set of character strings (e.g., plot.plotGeneSetScores.R, projectPCA.R etc.), code can be simplified using match.arg() and (optionally) switch(), e.g.:

foo <- \(se_object, method=c("PCA", "TSNE", "UMAP")) {
    arg <- match.arg(arg) # aleady checks validity
    mtx <- switch(arg, 
        PCA={...}, # do something special
        reducedDim(obj, arg) # otherwise just this
    })
    ...
}
  • [ ] not sure where/if this would break any function, but maybe worth adding a more validity checks on "simple" function arguments everywhere (e.g., is scalar, is logical/character/numeric etc.); e.g., passing 2 values to xxx_col in plotQCvsAnnotation() would break line 114 and the plotting due to renaming at line 117

  • [ ] fwiw...

# plotQCvsAnnotation.R:
qc_stats <- colData(query_data)[, qc_col]
cell_type_scores <- colData(query_data)[, score_col]
cell_labels <- colData(query_data)[[label_col]]
data <- data.frame(
    QCStats = qc_stats,
    Scores = cell_type_scores, 
    CellType = cell_labels)

# can be written as...
qc_stats <- query_data[[qc_col]]
cell_type_scores <- query_data[[score_col]] 
cell_labels <- query_data[[label_col]] 
data <- ...

# or... 
cols <- c(qc_col, score_col, label_col)
data <- colData(query_data)[, cols]
names(data) <- c("QCStats", "Scores", "CellType")
  • preallocate-and-fill should try and use v/m/lapply() rather than a for-loop; e.g., calculateHotellingPValue():
# Create a list to store p-values for each cell type
p_values <- rep(NA, length(unique_cell_types))
names(p_values) <- unique_cell_types
    
for (cell_type in unique_cell_types) {
    # Subset principal component scores for current cell type
    ref_subset_scores <- pca_output$ref[which(cell_type == reference_data[[ref_cell_type_col]]), pc_subset]
    query_subset_scores <- pca_output$query[which(cell_type == query_data[[query_cell_type_col]]), pc_subset]
    # Calculate the p-value
    hotelling_output <- Hotelling::hotelling.test(x = ref_subset_scores, y = query_subset_scores)
    # Store the result
    p_values[cell_type] <- hotelling_output$pval
}

# can be written as... (mock code)

names(unique_cell_types) <- unique_cell_types
idx_ref <- split(ncol(ref), ref[[ref_typ_col]])
idx_que <- split(ncol(que), que[[que_typ_col]])

p_values <- vapply(unique_cell_types, \(.) {
    ref_sco <- pca_out$ref[idx_ref[[.]], pc_subset]
    que_sco <- pca_out$que[idx_que[[.]], pc_subset]
    hotelling_output <- Hotelling::hotelling.test(x=ref_sco, y=que_sco)
    return(hotelling_output$pval)
}, numeric(1))

docs

  • [ ] repeating affiliations can be implemented as:
author:
- name: Peter Pan
  affiliation:
  - &foo Neverland
- name: Captain Hook
  affiliation:
  - *foo
  • [ ] please replace GitHub by normal Bioc installation instructions, i.e., BiocManager::install("ccb-hms/scDiagnostics") > "scDiagnostics" (also in the README!)

  • [ ] code chunk at lines 320+ is largely commented out code; also line 842? Commented out code should be removed for release

The following are mostly stylistic comments, which I'm aware are not strictly required; however, I feel that proper usage of text-style options and code formatting aren't negligible for legibility...

  • [ ] you can hyperlink external packages using BiocStyle functions, e.g., "[remotes](https://...)" > "`r BiocStyle::CRANpkg("remotes")`" and "AUCell" > "`r BiocStyle::Biocpkg("AUCell")`"

  • [ ] "scDiagnostics" (the package) is sometimes in code-style and sometimes in plain text; be consistent

  • [ ] when introducing example datasets (lines 102+ and 165+), perhaps worth hyperlinking to the source and/or DOI and/or adding proper citations (@AuthorYear), to make things easier for the reader?

  • [ ] to ease readability, I'd suggests using code-style (and maybe parentheses) to distinguish variable/class/slot/function/argument etc. names from plain text; e.g., "colData of" > colData "plotGeneExpressionDimred" > plotGeneExpressionDimred()

  • [ ] for readability, especially in the vignette, I'd strongly recommend sticking to the recommended 80-character limit and refraining from vertical code alignment (using 4-space tab indentation instead), as the latter introduces a lot of superfluous white space, especially when variable and function names are lengthy; note that this will mostly be displayed online (in the browser), so it's nice to not require horizontal scrolling (e.g., code chunk at lines 434+ is hard to parse)

HelenaLC avatar Jun 26 '24 10:06 HelenaLC

Thank you so much, @HelenaLC, for your prompt and thorough review - these comments are very useful and addressing them will only make the package better.

I was already aware of some of the issues you brought up and had been working on them for a while, so I hope to be able to submit the update shortly with all these points addressed.

I will let you know if there is anything I would like to consult with you before submitting the updated package for your follow-up review.

AnthonyChristidis avatar Jun 26 '24 18:06 AnthonyChristidis

Any update on this? Thanks.

HelenaLC avatar Aug 08 '24 09:08 HelenaLC

Hi @HelenaLC, thanks for reaching out. I recently came back from the BioC2024 conference in Grand Rapids (MI). Ludwig and I will meet shortly to discuss some final touch-ups based on the feedback we received before re-submitting the package for final review.

From my end, I incorporated all of your feedback and improved the functionality of the package further. I am also in the process of finalizing the multiple vignettes for the package (one "Get Started" and a few specialized vignettes that bunches functions with a similar purpose together).

I expect that I will submit the revised version of the package next week by Friday latest. I will keep in touch and notify you when I submit to provide some specifications/details of the revised submission.

AnthonyChristidis avatar Aug 08 '24 15:08 AnthonyChristidis

Hi @HelenaLC. Just letting you know I will be submitting next week, by Friday latest. Sorry for the delay and thanks for your patience on this!

AnthonyChristidis avatar Aug 15 '24 18:08 AnthonyChristidis

Received a valid push on git.bioconductor.org; starting a build for commit id: dbfc2e5cf1656506243ce4d969c2b611cf166c87

bioc-issue-bot avatar Aug 28 '24 05:08 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): scDiagnostics_0.99.7.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/scDiagnostics to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Aug 28 '24 05:08 bioc-issue-bot

Hi @HelenaLC , I'm glad to finally announce that I submitted the revised version of scDiagnostics, in which I addressed all of your comments to the best of my knowledge.

misc:

  • I have now added datasets to the package that have been curated/processed and the user can now use them to test the functions in a much more accessible manner (they already contain all the data/metadata to run the examples, i.e. the reducedDims data, the colData, etc). So this should also have simplified the examples for each function by reducing the preamble code of the examples.
  • I have implemented unit tests for every single exported function in the package, including the S3 plot methods.
  • I have added a ?scDiagnostics help page which now contains an overview of key functionality and also some links to main functions.
  • I have fixed most of the 4-spac tab indentation issues, although some remain.
  • I fixed all "no visible binding" issues through the use of the "rlang" package.

code:

  • I have fixed the code examples such that all are executed below 10 seconds on any reasonable machine.
  • I have removed the use of 1:... whenever possible. Some remain where the code would be odd if seq_len()/along() would be used.
  • I have used match.arg() whenever possible to match an argument against a predefined set of character string.
  • On top of adding more validity checks, I have created an internal argumentCheck() function that is recycled for almost every function in the package to check every single argument passed by the user.

docs:

  • Please see the pkgdown website for an overview of the package docs.
  • For now, since scDiagnostics is not on Bioconductor yet, I still recommend to install the package from the ccb-hms GitHub repository, but as soon as you let me know I can change it to Bioconductor, I will do so.
  • Note that there are now seven vignettes, including one "Getting Started with scDiagnostics". Each of the other vignettes bunch the functions together around a common theme. In the pkgdown website I also organized the functions in the Reference tab by common purpose/theme.
  • I added hyperlinks whenever possible throughout the documentation.
  • I added citations for the documentation of the datasets, also also references/citations for the key statistical methods/implementations used throughout the package.
  • I incorporated BioC-style code as much as possible, including style to distinguish variables, classes, functions, slots, etc.
  • As mentioned I tried to fix the 80-character limit for each line as much as possible. There are still some improvements that can be made, for for the sake of not making some of the functions too long, I kept some of these long lines as is (but let me know if you would still prefer to prioritize line length over function length).

Thanks again for your thorough review and all your help on this. I look forward to your feedback and eventual publication of this work on Bioconductor.

AnthonyChristidis avatar Aug 28 '24 05:08 AnthonyChristidis

Thanks Anthony, it's come along nicely -- build report is much cleaner, very much appreciate putting validity checks & unit tests in place, and examples run a whole lot better (with relevant data in place) -- couple minor points below ... also, not sure I'm missing something, but I don't see any of my previous comments under docs having been addressed? E.g., link to datasets, BiocStyle, affiliations -- all still looks much the same to me?

  • [ ] data-raw scripts should go under inst/script/ according to guidelines

  • [ ] re seq_len/along() -- I see your point with 1:10 (fixed number), but something like 1:n_top_vars should use seq_len(); please double check for such cases (package-wide search for "1:[a-z]" should do the trick)

  • [ ] re match.arg() -- you can avoid if (!x %in% c(...)), this is already checked for (e.g., plotGeneSetScores() lines 51+) ... also here, for dimension reductions, might be worth adding one check that it's in reducedDimNames() upfront rather than 3 independent checks & stop messages...

  • (this isn't a real review comment, rather perhaps helping with better coding practice in the future :) to given one example, in calculateHotellingPValue(), when you repeat the same (long) sequence of characters > 2 times, code is a lot less cluttered using an intermediate variable... e.g.,

observed_t2 <- hotellingT2(
  as.matrix(cell_list[[cell_type]][ref_ind, paste0("PC", pc_subset)]),
  as.matrix(cell_list[[cell_type]][!ref_ind, paste0("PC", pc_subset)]))
...
perm_t2[perm_id] <- hotellingT2(
  as.matrix(cell_list[[cell_type]][ref_sample_id, paste0("PC", pc_subset)]),
  as.matrix(cell_list[[cell_type]][-ref_sample_id, paste0("PC", pc_subset)]))

# could be:
x <- as.matrix(cell_list[[cell_type]][, paste0("PC", pc_subset)])
observed_t2 <- hotellingT2(x[ref_ind, ], x[!ref_ind, ])
...
perm_t2[perm_id] <- hotellingT2(x[ref_sample_id, ], x[-ref_sample_id, ])

HelenaLC avatar Aug 29 '24 04:08 HelenaLC