Contributions
Contributions copied to clipboard
crupR submission
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
- Repository: https://github.com/akbariomgba/crupR
Confirm the following by editing each check box to '[x]'
-
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
-
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
-
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
-
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
-
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
-
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
-
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
- [x] The 'devel' branch for new packages and features.
- [x] The stable 'release' branch, made available every six months, for bug fixes.
- [x] Bioconductor version control using Git (optionally via GitHub).
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.
Hi @akbariomgba
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: crupR
Type: Package
Title: an R package to predict condition-specific enhancers from ChIP-seq data
Version: 0.99.0
Authors@R: c(
person("Persia", "Akbari Omgba", email = "[email protected]", role = c("cre")),
person("Verena", "Laupert", email = "[email protected]", role = c("aut")),
person("Martin", "Vingron", email = "[email protected]", role = c("aut")))
Description: An R package offers a workflow to predict condition-specific enhancers from ChIP-seq data.
The prediction of regulatory units is done in four main Steps:
Step 1 is the normalization of the ChIP-seq counts.
Step 2 is the actual prediction of active enhancers bin-wise on the whole genome.
Step 3 is the condition-specific clustering of the putative active enhancers.
Step 4 is the search for possible target genes of the condition-specific clusters using RNA-seq counts.
License: GPL-3
Encoding: UTF-8
LazyData: false
Imports:
BSgenome.Mmusculus.UCSC.mm10,
BSgenome.Mmusculus.UCSC.mm9,
BSgenome.Hsapiens.UCSC.hg19,
BSgenome.Hsapiens.UCSC.hg38,
bamsignals,
Rsamtools,
GenomicRanges,
preprocessCore,
randomForest,
rtracklayer,
GenomeInfoDb,
S4Vectors,
parallel,
ggplot2,
matrixStats,
dplyr,
IRanges,
GenomicAlignments,
GenomicFeatures,
TxDb.Mmusculus.UCSC.mm10.knownGene,
TxDb.Mmusculus.UCSC.mm9.knownGene,
TxDb.Hsapiens.UCSC.hg19.knownGene,
TxDb.Hsapiens.UCSC.hg38.knownGene,
BSgenome,
reshape2,
magrittr,
stats,
utils,
grDevices
Depends:
R (>= 4.2.0)
Suggests: testthat, BiocStyle, knitr, rmarkdown
biocViews:
DifferentialPeakCalling,
GeneTarget,
FunctionalPrediction,
HistoneModification,
PeakDetection
BiocType: Software
URL: https://github.com/akbariomgba/crupR
RoxygenNote: 7.2.3
VignetteBuilder: knitr
I currently cannot build the package
R CMD build crupR
* checking for file 'crupR/DESCRIPTION' ... OK
* preparing 'crupR':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘crupR-vignette.Rmd’ using rmarkdown
Quitting from lines at lines 167-168 [unnamed-chunk-11] (crupR-vignette.Rmd)
Error: processing vignette 'crupR-vignette.Rmd' failed with diagnostics:
object 'C_pKS2' not found
--- failed re-building ‘crupR-vignette.Rmd’
SUMMARY: processing the following file failed:
‘crupR-vignette.Rmd’
Error: Vignette re-building failed.
Execution halted
Hello, When I am trying to build the package it runs w/o any errors, so I have trouble finding the bug. Could you please provide your session info or other information about your environment such that I can replicate it?
R CMD build crupR
* checking for file 'crupR/DESCRIPTION' ... OK
* preparing 'crupR':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘crupR-vignette.Rmd’ using rmarkdown
Quitting from lines 167-168 [unnamed-chunk-11] (crupR-vignette.Rmd)
Error: processing vignette 'crupR-vignette.Rmd' failed with diagnostics:
object 'C_pKS2' not found
--- failed re-building ‘crupR-vignette.Rmd’
SUMMARY: processing the following file failed:
‘crupR-vignette.Rmd’
Error: Vignette re-building failed.
Execution halted
My sessionInfo()
Bioconductor version 3.20 (BiocManager 1.30.23), R 4.4.0 Patched (2024-04-30
r86503)
> sessionInfo()
R version 4.4.0 Patched (2024-04-30 r86503)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /home/lorikern/R-Installs/bin/R-4-4-branch/lib/libRblas.so
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] praise_1.0.0 BiocManager_1.30.23
loaded via a namespace (and not attached):
[1] compiler_4.4.0
Thank you for the session info. I was able to reproduce the bug and fixed it. The repo has been updated.
I apologize for the inconvenience.
Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): crupR_0.99.0.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/crupR to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.
Hi @akbariomgba,
I tried building crupR locally (MacBook Air M1 with 16 GB RAM, full session info below) and hit the following error when building the vignette:
R_ENVIRON_USER=~/.Renviron.bioc R CMD build .
* checking for file ‘./DESCRIPTION’ ... OK
* preparing ‘crupR’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘crupR-vignette.Rmd’ using rmarkdown
Quitting from lines 123-125 [run-getEnhancers] (crupR-vignette.Rmd)
Error: processing vignette 'crupR-vignette.Rmd' failed with diagnostics:
vector memory limit of 16.0 Gb reached, see mem.maxVSize()
--- failed re-building ‘crupR-vignette.Rmd’
SUMMARY: processing the following file failed:
‘crupR-vignette.Rmd’
Error: Vignette re-building failed.
Execution halted
The same error occurs if I install the package (without vignettes) and then step through the vignette code.
What systems have you built and tested the package on?
Session info
> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Australia/Melbourne
tzcode source: internal
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] crupR_0.99.0
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 BSgenome.Mmusculus.UCSC.mm9_1.4.0
[3] dplyr_1.1.4 blob_1.2.4
[5] Biostrings_2.73.1 bitops_1.0-7
[7] fastmap_1.2.0 RCurl_1.98-1.14
[9] GenomicAlignments_1.41.0 bamsignals_1.37.0
[11] XML_3.99-0.16.1 lifecycle_1.0.4
[13] KEGGREST_1.45.0 TxDb.Hsapiens.UCSC.hg38.knownGene_3.18.0
[15] RSQLite_2.3.7 magrittr_2.0.3
[17] compiler_4.4.0 rlang_1.1.4
[19] tools_4.4.0 utf8_1.2.4
[21] yaml_2.3.8 BSgenome.Hsapiens.UCSC.hg38_1.4.5
[23] rtracklayer_1.65.0 S4Arrays_1.5.1
[25] bit_4.0.5 curl_5.2.1
[27] DelayedArray_0.31.1 plyr_1.8.9
[29] abind_1.4-5 BiocParallel_1.39.0
[31] BiocGenerics_0.51.0 grid_4.4.0
[33] stats4_4.4.0 preprocessCore_1.67.0
[35] fansi_1.0.6 colorspace_2.1-0
[37] ggplot2_3.5.1 scales_1.3.0
[39] SummarizedExperiment_1.35.0 cli_3.6.2
[41] crayon_1.5.2 generics_0.1.3
[43] httr_1.4.7 reshape2_1.4.4
[45] rjson_0.2.21 DBI_1.2.3
[47] cachem_1.1.0 stringr_1.5.1
[49] zlibbioc_1.51.1 parallel_4.4.0
[51] AnnotationDbi_1.67.0 BiocManager_1.30.23
[53] XVector_0.45.0 restfulr_0.0.15
[55] matrixStats_1.3.0 vctrs_0.6.5
[57] Matrix_1.7-0 jsonlite_1.8.8
[59] IRanges_2.39.0 S4Vectors_0.43.0
[61] bit64_4.0.5 GenomicFeatures_1.57.0
[63] glue_1.7.0 codetools_0.2-20
[65] BSgenome.Mmusculus.UCSC.mm10_1.4.3 stringi_1.8.4
[67] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 gtable_0.3.5
[69] GenomeInfoDb_1.41.1 BiocIO_1.15.0
[71] GenomicRanges_1.57.0 UCSC.utils_1.1.0
[73] munsell_0.5.1 tibble_3.2.1
[75] pillar_1.9.0 randomForest_4.7-1.1
[77] BSgenome.Hsapiens.UCSC.hg19_1.4.3 GenomeInfoDbData_1.2.12
[79] BSgenome_1.73.0 R6_2.5.1
[81] lattice_0.22-6 Biobase_2.65.0
[83] png_0.1-8 Rsamtools_2.21.0
[85] memoise_2.0.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[87] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 renv_1.0.7
[89] Rcpp_1.0.12 SparseArray_1.5.7
[91] MatrixGenerics_1.17.0 pkgconfig_2.0.3
Hi @PeteHaitch ,
I have built and tested the package on linux. I'll try to use a VM to get to the root of the issue.
FYI I've narrowed it down to the line https://github.com/akbariomgba/crupR/blob/a1d30b9825cc552b2ec951fe7f513ffb3335c83d/R/enhancerPrediction.R#L82.
I can try to help further debug the issue on my mac. Any idea what memory usage is required by this function (or others that might use a bit of RAM)?
Any updates on this @akbariomgba? I was able to build the package on a linux VM with 32 GB RAM and 6 cores but not when running with 16 GB RAM and 2 cores; session info below.
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /stornext/System/data/apps/R/R-4.4.0/lib64/R/lib/libRblas.so
LAPACK: /stornext/System/data/apps/R/R-4.4.0/lib64/R/lib/libRlapack.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
Here's the output for the failed build
% R_ENVIRON_USER=~/.Renviron.bioc R CMD build .
* checking for file ‘./DESCRIPTION’ ... OK
* preparing ‘crupR’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘crupR-vignette.Rmd’ using rmarkdown
Quitting from lines 85-89 [run-normalize] (crupR-vignette.Rmd)
Error: processing vignette 'crupR-vignette.Rmd' failed with diagnostics:
'names' attribute [4] must be the same length as the vector [2]
--- failed re-building ‘crupR-vignette.Rmd’
SUMMARY: processing the following file failed:
‘crupR-vignette.Rmd’
Error: Vignette re-building failed.
Execution halted
This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.
Thank you for your interest in Bioconductor.
@PeteHaitch Sorry for the long break. Thank you for your help in debugging the code! I looked into what you have sent and found the reason for the space issues. I fixed it and tested in on macOS with 16gb ram and it finally ran. I pushed the changes to the crupR repo and also the crupR repo on the Bioconductor devel branch. I hope everything worked out and the issue is resolved now.
Dear @akbariomgba ,
We have reopened the issue to continue the review process. Please remember to push a version bump to git.bioconductor.org to trigger a new build.
A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.
Hi @akbariomgba,
Please remember to push a version bump to git.bioconductor.org to trigger a new build.
Thanks, Pete
Received a valid push on git.bioconductor.org; starting a build for commit id: 4478270222798bc568a937d46ad94f9020333418
Hello @PeteHaitch ,
Sorry, I pushed it now.
Best, Persia
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): crupR_0.99.1.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/crupR to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hello @PeteHaitch , That is weird, because it worked for me. I will check again and see if I can get the objects to be smaller.
Hi @akbariomgba, sorry please ignore that earlier message.
I subsequently deleted it because I realised I had been checking an old version of crupR (v0.99.0).
I've been able to successfully build the current version (v0.99.1), although R CMD check failed; I'm still looking into why that failed.
The R CMD check errors look to be related to downloads not being available.
In case this is an intermittent thing, I'll try again later, but package's functions, examples, and tests should fail gracefully when resources are not available.
$ R_ENVIRON_USER=~/.Renviron.bioc R CMD check crupR_0.99.1.tar.gz
* using log directory ‘/Users/peter/GitHub/crupR/crupR.Rcheck’
* using R Under development (unstable) (2024-11-10 r87316)
* using platform: aarch64-apple-darwin20
* R was compiled by
Apple clang version 14.0.0 (clang-1400.0.29.202)
GNU Fortran (GCC) 12.2.0
* running under: macOS Sequoia 15.1.1
* using session charset: UTF-8
* checking for file ‘crupR/DESCRIPTION’ ... OK
* checking extension type ... Package
* this is package ‘crupR’ version ‘0.99.1’
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘crupR’ can be installed ... OK
* checking installed package size ... INFO
installed size is 13.0Mb
sub-directories of 1Mb or more:
extdata 12.1Mb
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking code files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... NOTE
getDynamics: no visible binding for global variable ‘condition’
getTargets: no visible binding for global variable ‘condition’
normalize: no visible global function definition for ‘is’
plotSummary: no visible binding for global variable ‘condition’
plotSummary: no visible binding for global variable ‘value’
Undefined global functions or variables:
condition is value
Consider adding
importFrom("methods", "is")
to your NAMESPACE file (and ensure that your DESCRIPTION Imports field
contains 'methods').
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... ERROR
Running examples in ‘crupR-Ex.R’ failed
The error most likely occurred in:
> ### Name: normalize
> ### Title: Normalization of ChIPseq counts.
> ### Aliases: normalize
>
> ### ** Examples
>
>
> files <- c(system.file("extdata", "Condition1.H3K4me1.bam", package="crupR"),
+ system.file("extdata", "Condition1.H3K4me3.bam", package="crupR"),
+ system.file("extdata", "Condition1.H3K27ac.bam", package="crupR"),
+ system.file("extdata", "Condition2.H3K4me1.bam", package="crupR"),
+ system.file("extdata", "Condition2.H3K4me3.bam", package="crupR"),
+ system.file("extdata", "Condition2.H3K27ac.bam", package="crupR"))
>
> inputs <- c(rep(system.file("extdata", "Condition1.Input.bam", package="crupR"), 3),
+ rep(system.file("extdata", "Condition2.Input.bam", package="crupR"),3))
>
> metaData <- data.frame(HM = rep(c("H3K4me1","H3K4me3","H3K27ac"),2),
+ condition = c(1,1,1,2,2,2), replicate = c(1,1,1,1,1,1),
+ bamFile = files, inputFile = inputs)
>
> normalize(metaData = metaData, condition = 1, replicate = 1,
+ genome = "mm10", sequencing = "paired", C = 2)
Prepare the binned genome ...
Warning in download.file(url, destfile, method, quiet = TRUE) :
cannot open URL 'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/': FTP status was '421 Service not available, closing control connection'
Error in download.file(url, destfile, method, quiet = TRUE) :
cannot open URL 'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/'
Calls: normalize ... .form_assembly_report_url -> find_NCBI_assembly_ftp_dir
Execution halted
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
Running ‘testthat.R’
ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
18. └─GenomeInfoDb:::.map_UCSC_seqlevels_to_NCBI_or_RefSeq(...)
19. └─GenomeInfoDb::getChromInfoFromUCSC(genome, map.NCBI = TRUE)
20. └─GenomeInfoDb:::.get_chrom_info_for_registered_UCSC_genome(...)
21. ├─BiocGenerics::do.call(...)
22. ├─base::do.call(...)
23. └─GenomeInfoDb (local) `<fn>`(`<df[,4]>`, assembly_accession = "GCF_000001635.26")
24. └─GenomeInfoDb::getChromInfoFromNCBI(assembly_accession, assembly.units = AssemblyUnits)
25. └─GenomeInfoDb:::.get_NCBI_chrom_info_from_accession(...)
26. └─GenomeInfoDb::fetch_assembly_report(accession, assembly_name = assembly_name)
27. └─GenomeInfoDb:::.form_assembly_report_url(...)
28. └─GenomeInfoDb::find_NCBI_assembly_ftp_dir(...)
[ FAIL 1 | WARN 1 | SKIP 0 | PASS 46 ]
Error: Test failures
Execution halted
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes ... OK
* checking re-building of vignette outputs ... ERROR
Error(s) in re-building vignettes:
...
--- re-building ‘crupR-vignette.Rmd’ using rmarkdown
Quitting from lines 85-89 [run-normalize] (crupR-vignette.Rmd)
Error: processing vignette 'crupR-vignette.Rmd' failed with diagnostics:
cannot open URL 'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/'
--- failed re-building ‘crupR-vignette.Rmd’
SUMMARY: processing the following file failed:
‘crupR-vignette.Rmd’
Error: Vignette re-building failed.
Execution halted
* checking PDF version of manual ... OK
* DONE
Status: 3 ERRORs, 1 NOTE
See
‘/Users/peter/GitHub/crupR/crupR.Rcheck/00check.log’
for details.
The above seems to have been an intermittent issue with the resource being accessed. I will post my review shortly.
Hi @akbariomgba ,
Thank you for submitting crupR Bioconductor. Also, once again, thank you for your patience with this review.
I've completed my checklist review of crupR. Overall, the package is in good shape, but there are several important points that must be addressed before it can be accepted.
In my checklist review below I have separated the issues into Required and Recommended points that I would ask you to address before the package can be accepted. Would you please provide line-by-line comments to my initial review so that I know what changes I'm looking for in my re-review.
Cheers, Pete
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R Under development (unstable) (2024-11-10 r87316)
#> os macOS Sequoia 15.1.1
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Australia/Melbourne
#> date 2024-11-25
#> pandoc 3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> abind 1.4-8 2024-09-12 [1] CRAN (R 4.5.0)
#> AnnotationDbi 1.69.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> bamsignals 1.39.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> Biobase * 2.67.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> BiocGenerics * 0.53.3 2024-11-14 [1] Bioconductor 3.21 (R 4.5.0)
#> BiocIO 1.17.1 2024-11-21 [1] Bioconductor 3.21 (R 4.5.0)
#> BiocParallel 1.41.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> Biostrings 2.75.1 2024-11-07 [1] Bioconductor 3.21 (R 4.5.0)
#> bit 4.5.0 2024-09-20 [1] CRAN (R 4.5.0)
#> bit64 4.5.2 2024-09-22 [1] CRAN (R 4.5.0)
#> bitops 1.0-9 2024-10-03 [1] CRAN (R 4.5.0)
#> blob 1.2.4 2023-03-17 [1] CRAN (R 4.5.0)
#> BSgenome 1.75.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> BSgenome.Hsapiens.UCSC.hg19 1.4.3 2024-11-19 [1] Bioconductor
#> BSgenome.Hsapiens.UCSC.hg38 1.4.5 2024-11-19 [1] Bioconductor
#> BSgenome.Mmusculus.UCSC.mm10 1.4.3 2024-11-19 [1] Bioconductor
#> BSgenome.Mmusculus.UCSC.mm9 1.4.0 2024-11-19 [1] Bioconductor
#> cachem 1.1.0 2024-05-16 [1] CRAN (R 4.5.0)
#> cli 3.6.3 2024-06-21 [1] CRAN (R 4.5.0)
#> codetools 0.2-20 2024-03-31 [1] CRAN (R 4.5.0)
#> colorspace 2.1-1 2024-07-26 [1] CRAN (R 4.5.0)
#> crayon 1.5.3 2024-06-20 [1] CRAN (R 4.5.0)
#> crupR * 0.99.1 2024-11-25 [1] Bioconductor
#> curl 6.0.1 2024-11-14 [1] CRAN (R 4.5.0)
#> DBI 1.2.3 2024-06-02 [1] CRAN (R 4.5.0)
#> DelayedArray 0.33.2 2024-11-15 [1] Bioconductor 3.21 (R 4.5.0)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.5.0)
#> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.5.0)
#> evaluate 1.0.1 2024-10-10 [1] CRAN (R 4.5.0)
#> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.5.0)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.5.0)
#> fs 1.6.5 2024-10-30 [1] CRAN (R 4.5.0)
#> generics * 0.1.3 2022-07-05 [1] CRAN (R 4.5.0)
#> GenomeInfoDb * 1.43.1 2024-11-15 [1] Bioconductor 3.21 (R 4.5.0)
#> GenomeInfoDbData 1.2.13 2024-11-13 [1] Bioconductor
#> GenomicAlignments 1.43.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> GenomicFeatures 1.59.1 2024-11-07 [1] Bioconductor 3.21 (R 4.5.0)
#> GenomicRanges * 1.59.1 2024-11-15 [1] Bioconductor 3.21 (R 4.5.0)
#> ggplot2 3.5.1 2024-04-23 [1] CRAN (R 4.5.0)
#> glue 1.8.0 2024-09-30 [1] CRAN (R 4.5.0)
#> gtable 0.3.6 2024-10-25 [1] CRAN (R 4.5.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.5.0)
#> httr 1.4.7 2023-08-15 [1] CRAN (R 4.5.0)
#> IRanges * 2.41.1 2024-11-15 [1] Bioconductor 3.21 (R 4.5.0)
#> jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.5.0)
#> KEGGREST 1.47.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> knitr 1.49 2024-11-08 [1] CRAN (R 4.5.0)
#> lattice 0.22-6 2024-03-20 [1] CRAN (R 4.5.0)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.5.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.5.0)
#> Matrix 1.7-1 2024-10-18 [1] CRAN (R 4.5.0)
#> MatrixGenerics * 1.19.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> matrixStats * 1.4.1 2024-09-08 [1] CRAN (R 4.5.0)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.5.0)
#> munsell 0.5.1 2024-04-01 [1] CRAN (R 4.5.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.5.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.5.0)
#> plyr 1.8.9 2023-10-02 [1] CRAN (R 4.5.0)
#> png 0.1-8 2022-11-29 [1] CRAN (R 4.5.0)
#> preprocessCore 1.69.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.5.0)
#> randomForest 4.7-1.2 2024-09-22 [1] CRAN (R 4.5.0)
#> Rcpp 1.0.13-1 2024-11-02 [1] CRAN (R 4.5.0)
#> RCurl 1.98-1.16 2024-07-11 [1] CRAN (R 4.5.0)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.5.0)
#> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.5.0)
#> restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.5.0)
#> rjson 0.2.23 2024-09-16 [1] CRAN (R 4.5.0)
#> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.5.0)
#> rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.5.0)
#> Rsamtools 2.23.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> RSQLite 2.3.8 2024-11-17 [1] CRAN (R 4.5.0)
#> rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.5.0)
#> rtracklayer 1.67.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> S4Arrays 1.7.1 2024-10-30 [1] Bioconductor 3.21 (R 4.5.0)
#> S4Vectors * 0.45.2 2024-11-15 [1] Bioconductor 3.21 (R 4.5.0)
#> scales 1.3.0 2023-11-28 [1] CRAN (R 4.5.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.5.0)
#> SparseArray 1.7.2 2024-11-14 [1] Bioconductor 3.21 (R 4.5.0)
#> stringi 1.8.4 2024-05-06 [1] CRAN (R 4.5.0)
#> stringr 1.5.1 2023-11-14 [1] CRAN (R 4.5.0)
#> SummarizedExperiment * 1.37.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.5.0)
#> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.5.0)
#> TxDb.Hsapiens.UCSC.hg19.knownGene 3.2.2 2024-11-19 [1] Bioconductor
#> TxDb.Hsapiens.UCSC.hg38.knownGene 3.20.0 2024-11-19 [1] Bioconductor
#> TxDb.Mmusculus.UCSC.mm10.knownGene 3.10.0 2024-11-19 [1] Bioconductor
#> TxDb.Mmusculus.UCSC.mm9.knownGene 3.2.2 2024-11-19 [1] Bioconductor
#> UCSC.utils 1.3.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.5.0)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.5.0)
#> withr 3.0.2 2024-10-28 [1] CRAN (R 4.5.0)
#> xfun 0.49 2024-10-31 [1] CRAN (R 4.5.0)
#> XML 3.99-0.17 2024-06-25 [1] CRAN (R 4.5.0)
#> XVector 0.47.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.5.0)
#> zlibbioc 1.53.0 2024-10-29 [1] Bioconductor 3.21 (R 4.5.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Required
- [ ] As far as I can tell, there is no need for the various BSgenome packages be listed under
ImportsinDESCRIPTION. Removing these will greatly reduce the size of the dependencies. It looks like all you need for crupR is the corresponding SeqInfo for each genome, which can be obtained using the GenomeInfoDB package via thegetChromInfoFromUCSC()/getChromInfoFromNCBI()/getChromInfoFromEnsembl()functions withas.Seqinfo = TRUE. Please carefully read the documentation to determine the most appropriate way to use these functions in crupR. - [ ] Instead of using
cat(), usemessage(),warning(),stop()for reporting updates, warnings, and errors. - [ ] What is the format for the
classifierargument ingetEnhancers()? The man pages don't give any details. And although the vignette gives some general points, it doesn't give enough specifics for a user to be able to create this object. - [ ] More generally, I found the documentation in the man pages are a little sparse on detail. For example, I think a user would appreciate a brief description of the normalization method in
normalize(), the clustering method ingetDynamics(), the prediction method used ingetEnhancers(), how super enhancers are definited and identified bygetSE(), etc. Such information would usually be included in a 'Details' section of the man page. - [ ] Please include code (not just a text description of the code) and DOIs/URLs where applicable to documentation in
inst/script. - [ ] Please proofread and spellcheck the vignette. Some specific issues I noticed (not an exhaustive list):
- I would recommend wrapping thre code chunks at 80 characters to make them more readable.
- Check formatting of rendered text. There appear to be headers not properly rendered and lists not properly rendered.
- Typos
- [ ] Please proofread and spellcheck the man pages. E.g.,
?plotSummarysays "num_plots ... Default is 20." but the actual value is 9. - [ ] Parallelisation should use BiocParallel rather than parallel; see https://contributions.bioconductor.org/r-code.html#parallel-recommendations. Briefly, parallel does not work properly on Windows ("It relies on forking and hence is not available on Windows unless mc.cores = 1."; see
?parallel::mclapply) whereas BiocParallel allows the user to provide their preferred parallelisation scheme and chooses a parallel back-end appropriate for the OS and is supported across Unix, Mac and Windows. To make this change to crupR, the general idea would be:- Replace all calls to
parallel::mclapply()andparallell::mcmapply()with calls toBiocParallel::bplapply()andBiocParallel::bpmapply(). - Replace all the
C = 1("Number of cores to use. Default is 1") function arguments withBPPARAM = SerialParam().
- Replace all calls to
- [ ]
saveFiles()doesn't work as intended, as illustrated by the example below. This happens becausesaveFiles()does stuff likeout.bed <- paste0(outdir, "clusterEnh.bed")instead of something likeout.bed <- file.path(outdir, "clusterEnh.bed").
suppressPackageStartupMessages(library(crupR))
# Example adapted from ?crupR::saveFiles
files <- c(system.file("extdata", "Condition1.H3K4me1.bam", package="crupR"),
system.file("extdata", "Condition1.H3K4me3.bam", package="crupR"),
system.file("extdata", "Condition1.H3K27ac.bam", package="crupR"),
system.file("extdata", "Condition2.H3K4me1.bam", package="crupR"),
system.file("extdata", "Condition2.H3K4me3.bam", package="crupR"),
system.file("extdata", "Condition2.H3K27ac.bam", package="crupR"))
inputs <- rep(system.file("extdata", "Condition1.Input.bam", package="crupR"), 3)
inputs2 <- rep(system.file("extdata", "Condition2.Input.bam", package="crupR"), 3)
metaData <- data.frame(HM = rep(c("H3K4me1", "H3K4me3", "H3K27ac"),2),
condition = c(1,1,1,2,2,2), replicate = c(1,1,1,1,1,1),
bamFile = files, inputFile = c(inputs, inputs2))
clusters <- readRDS(system.file("extdata", "differential_enhancers.rds", package="crupR"))
dynamics <- list("metaData" = metaData, "sumFile" = clusters)
# Demonstrate that saveFiles() saves in the wrong location.
outdir <- file.path(tempdir(), "crupR")
dir.create(outdir)
# There's nothing in the directory before running saveFiles().
list.files(outdir)
#> character(0)
saveFiles(data = dynamics, modes = "beds", outdir = outdir)
# There's nothing in the directory **after** running saveFiles().
list.files(outdir)
#> character(0)
# Instead, the files have been written one level up.
list.files(dirname(outdir))
#> [1] "crupR" "crupRdynamicEnh__cluster_c1.bed"
- [ ]
saveFiles()example must use the temporary directory (tempdir()) and not another location on the user's system (especially not the installation directory). - [ ] Please add an Installation section to the vignette; see https://contributions.bioconductor.org/docs.html#installation and note that, once the package is accepted, these instructions should use
BiocManager::install()rather thandevtools::install_git(). - [ ] Please add a package-level man page; see https://contributions.bioconductor.org/docs.html#package-level-documentation
Recommended
- [ ] I have a couple of suggestions to simplify the returned values. You can take or leave these, but I bring them to your attention as they may help make the package more cohesive with the broader Bioconductor ecosystem.
- GRanges objects can already carry along arbitrary metadata. This means that the return values of some functions (e.g.,
normalize(),getEnhancers(), etc.) could be simplified to return a single GRanges object rather than a list containing a GRanges object along with a data.frame of metadata. This example hopefully demonstrates that point.
- GRanges objects can already carry along arbitrary metadata. This means that the return values of some functions (e.g.,
suppressPackageStartupMessages(library(GenomicRanges))
# Mock up example data (copied from crupR vignette)
files <- c(
system.file("extdata", "Condition1.H3K4me1.bam", package = "crupR"),
system.file("extdata", "Condition1.H3K4me3.bam", package = "crupR"),
system.file("extdata", "Condition1.H3K27ac.bam", package = "crupR"),
system.file("extdata", "Condition2.H3K4me1.bam", package = "crupR"),
system.file("extdata", "Condition2.H3K4me3.bam", package = "crupR"),
system.file("extdata", "Condition2.H3K27ac.bam", package = "crupR"))
inputs <- c(
rep(system.file("extdata", "Condition1.Input.bam", package = "crupR"), 3),
rep(system.file("extdata", "Condition2.Input.bam", package = "crupR"), 3))
metaData <- data.frame(
HM = rep(c("H3K4me1","H3K4me3","H3K27ac"), 2),
condition = c(1, 1, 1, 2, 2, 2),
replicate = c(1, 1, 1, 1, 1, 1),
bamFile = files,
inputFile = inputs)
# Mock up a GRanges object and add metadata to it
gr <- GRanges(
seqnames = Rle(c("chr2", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges = IRanges(1:10, width = 10:1))
metadata(gr) <- metaData
# Metadata isn't printed by default but can be accessed using metadata().
gr
#> GRanges object with 10 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr2 1-10 *
#> [2] chr2 2-10 *
#> [3] chr2 3-10 *
#> [4] chr2 4-10 *
#> [5] chr1 5-10 *
#> [6] chr1 6-10 *
#> [7] chr3 7-10 *
#> [8] chr3 8-10 *
#> [9] chr3 9-10 *
#> [10] chr3 10 *
#> -------
#> seqinfo: 3 sequences from an unspecified genome; no seqlengths
metadata(gr)
#> HM condition replicate
#> 1 H3K4me1 1 1
#> 2 H3K4me3 1 1
#> 3 H3K27ac 1 1
#> 4 H3K4me1 2 1
#> 5 H3K4me3 2 1
#> 6 H3K27ac 2 1
#> bamFile
#> 1 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition1.H3K4me1.bam
#> 2 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition1.H3K4me3.bam
#> 3 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition1.H3K27ac.bam
#> 4 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition2.H3K4me1.bam
#> 5 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition2.H3K4me3.bam
#> 6 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition2.H3K27ac.bam
#> inputFile
#> 1 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition1.Input.bam
#> 2 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition1.Input.bam
#> 3 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition1.Input.bam
#> 4 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition2.Input.bam
#> 5 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition2.Input.bam
#> 6 /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/crupR/extdata/Condition2.Input.bam
- Within Bioconductor, annotated matrix-like data (such as expression data) are usually stored in a SummarizedExperiment rather than as metadata columns on the features (e.g., DESeq2 and edgeR both support SummarizedExperiment objects for this reason). Below is a more Bioconductor-like formatting of the example data.
suppressPackageStartupMessages(library(crupR))
suppressPackageStartupMessages(library(SummarizedExperiment))
# Load example expression data from crupR
expression <- readRDS(
file = system.file("extdata", "expressions.rds", package = "crupR"))
# Represent expression data using in a more Bioconductor-like way.
suppressPackageStartupMessages(library(SummarizedExperiment))
expression_se <- SummarizedExperiment(
rowRanges = granges(expression),
assays = list(
expression = as.matrix(mcols(expression)[, c("cond1_1", "cond2_1")])))
expression_se
#> class: RangedSummarizedExperiment
#> dim: 1028 2
#> metadata(0):
#> assays(1): expression
#> rownames(1028): 100037278 100038489 ... 97487 99375
#> rowData names(0):
#> colnames(2): cond1_1 cond2_1
#> colData names(0):
rowRanges(expression_se)
#> GRanges object with 1028 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> 100037278 chr8 71597648-71607936 +
#> 100038489 chr8 65014856-65037336 -
#> 100038927 chr8 21134642-21135598 +
#> 100039801 chr8 47822143-47822805 +
#> 100040294 chr8 87460469-87525554 +
#> ... ... ... ...
#> 97440 chr8 105252638-105255153 -
#> 97476 chr8 21958714-21964303 +
#> 97484 chr8 107046289-107056689 -
#> 97487 chr8 104348191-104395807 -
#> 99375 chr8 13105621-13147940 +
#> -------
#> seqinfo: 66 sequences (1 circular) from mm10 genome
head(assay(expression_se))
#> cond1_1 cond2_1
#> 100037278 10.435862 10.451812
#> 100038489 12.214742 12.171828
#> 100038927 9.745927 9.760020
#> 100039801 9.593001 9.645844
#> 100040294 10.315618 10.298862
#> 100040599 9.717884 9.667728
- [ ] One of the things a vignette usually starts with is
library(crupR)to avoid having to namespace everything that follows. - [ ] The vignette states, "
getSE()uses the output of the last step (enhancerPrediction)". However, there is no function calledenhancerPrediction. Should this be the output ofgetEnhancers()? - [ ] The vignette states, "Here, one can see that increasing the cutoff from the default value 0.5 to 0.7 reduced the number of detected single enhancers peaks.". But I don't see that in the output (see screenshot)
- [ ] I generally recommend not repeating information from the man page documentation in the vignette and instead have the vignette focus on the bigger picture. I would leave the description of function arguments to the man pages (otherwise if you make changes you have to remember to update them in multiple places).
- [ ] "The plot shows that the enhancers of condition 2 seem to be more active than the ones of condition 1 in cluster 1, while the opposite is the case for the enhancers of the second cluster.". But I only see 1 cluster in this plot (see screenshot)
- [ ] The code formatting makes it a bit hard to read. I'd recommend bringing some consistency to it, either manually or by using a tool such as styler or formatR; see https://contributions.bioconductor.org/r-code.html#coding-style
- [ ] The value of the
TitleinDESCRIPTIONshould start with a capital letter. - [ ] Recommend adding a
BugReportsfield toDESCRIPTION; see https://contributions.bioconductor.org/description.html#description-bugreport - [ ] Good job adding unit tests. Please take a look at
covr::report()for areas that might be improved. - [ ] Please consider the
NOTESraised byR CMD checkandBiocCheck::BiocCheck()and try to reduce these where reasonable.
Hello @PeteHaitch , Thank you for your comments! I will work through them and get back to you once done.
Hello @PeteHaitch I am working through your comments and basically done with the required changes and am taking a look at the recommended ones. I wanted to ask when the deadline is for the re-submission of the revised package, so I can plan accordingly.
No real rush, @akbariomgba, but updates such as the above are welcomed and appreciated. If we haven't seen any activity on a package review after 3-4 weeks we will close it, but it can be re-opened (as we did before). The next Bioconductor release will happen in April
Hi @akbariomgba ,
I am on leave for the Christmas / New Year break where I work, so please don't expect a reply from me until after I return to work on January 2.
Cheers, Pete
This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.
Thank you for your interest in Bioconductor.