Contributions
Contributions copied to clipboard
CustomGenome
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
- Repository: https://github.com/mtekman/CustomGenome
Confirm the following by editing each check box to '[x]'
-
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
-
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
-
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
-
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
-
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
-
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
-
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
- [x] The 'devel' branch for new packages and features.
- [x] The stable 'release' branch, made available every six months, for bug fixes.
- [x] Bioconductor version control using Git (optionally via GitHub).
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.
Hi @mtekman
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: CustomGenome
Type: Package
Title: Append Custom FASTA Sequences to your Genome and Annotation Files for Downstream Mapping
Version: 0.99.0
Authors@R:
c(person(given="Mehmet", family="Tekman",
email = "[email protected]",
role = c("aut", "cre"), comment = c(ORCID = "0000-0002-4181-2676")),
person(given="Sebastian", family="Arnold",
email = "[email protected]",
role = c("fnd"), comment = c(ORCID = "0000-0002-2688-9210")))
Description: CustomGenome generates a new Ensembl reference genome
sequence and annotation files with custom FASTA sequences (e.g.
GFP, Tomato) and produces RNA-seq count matrices with them. Genome
reference files are fetched and validated directly from Ensembl,
user-sequences are then appended to both, and then the new
reference files are written to disk. Downstream analysis is then
facilitated via Rsubread function wrappers to index the new
reference, align user FASTA sequences to it, and then perform
quantification for desired features (e.g. exon, utr, etc.)
License: GPL-3
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
VignetteBuilder: knitr
biocViews: Annotation, RNASeq, Alignment, MultipleSequenceAlignment, QualityControl, Preprocessing, Sequencing, DataImport, GeneExpression
Imports: Rsubread (>= 2.14.2), Biostrings (>= 2.68.1), rtracklayer (>= 1.60.0), GenomicRanges (>= 1.52.0), tools, utils
Suggests:
Rsamtools,
BiocStyle,
knitr,
rmarkdown,
testthat (>= 3.0.0)
URL: https://gitlab.com/mtekman/CustomGenome
BugReports: https://gitlab.com/mtekman/CustomGenome/issues
Config/testthat/edition: 3
Hi, I'm just providing some thoughts/comments since I was also dealing a lot with Ensembl annotations.
- you are downloading fasta and GTF sequences yourself from Ensembl - have you checked whether these would also be available in Bioconductor's
AnnotationHub? Or, would it maybe make sense to use theBiocFileCachepackage for downloading/caching these resources? IMHO that would simplify this process.
The general idea to create custom annotations or add additional sequence/gene information to reference genomes is however really nice.
Hello @jorainer, thanks for the comments!
The general idea to create custom annotations or add additional sequence/gene information to reference genomes is however really nice.
Exactly, I was surprised it wasn't already a package!
AnnotationHub
Ah that's why these reviews are golden. I wish I had heard about this before I implemented the backend for Ensembl downloads that I did.
library(AnnotationHub)
ah = AnnotationHub(cache = "/my/dest/cache")
ah[ah$species == "Mus musculus" &
ah$dataprovider == "Ensembl" &
ah$genome=="GRCm38" &
ah$sourcetype %in% c("GTF", "ensembl")]
The problem I'm seeing is how to pair a given genome with a given GTF file? With Ensembl this is pretty unambiguous, but with AnnotationHub I'm overwhelmed with choice.
I think this is something I will implement into the next version of this tool, since right now this is very much geared towards Ensembl downloads.
BiocFileCache
I had never heard of this, and it sounds pretty useful for some future analyses. I have some angst related to how it checks for data consistency:
https://bioconductor.org/packages/release/bioc/vignettes/BiocFileCache/inst/doc/BiocFileCache.html
bfcneedsupdate() is a method that will check the local copy of the data’s etag and last_modifed time to the etag and last_modified time of the remote resource as well as an expires time. The cache saves this information when the web resource is initially added. The expires time is checked against the current Sys.time to see if the local resource has expired. If so the resource will deem need to be updated; if unavailable or not expired will check the etag and last_modified_time. The etag information is used definitively if it is available, if it is not available it checks the last_modified time. If the resource does not have a last_modified tag either, it is undetermined. If the resource has not been download yet, it is TRUE
It sounds like it doesn't do any consistency checks with remote checksums, which is a main indicator for me on whether a large file downloaded isn't truncated. Have you had any issues with this in your code?
Also, how does BiocFileCache interact with AnnotionHub? I have the impression that AnnotationHub has it's own caching.
BiocFileCache is used as the backend caching mechanism in AnnotationHub.
You'll need an inst/script data that describes how the data in inst/extdata was generated, and include any relevant source and licensing information for the data there.
Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: CustomGenome_0.99.0.tar.gz Linux (Ubuntu 22.04.2 LTS): CustomGenome_0.99.0.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/CustomGenome to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Please correct ERROR, Warning, Notes from the build report before a reviewer is assigned for in depth review.
@lshep Thanks, I've fixed the issue! Sorry for the long delay, it was a busy month
Please push the changes to git.bioconductor.org to trigger a new build report. see https://github.com/Bioconductor/Contributions/issues/3224#issuecomment-1819369522
Received a valid push on git.bioconductor.org; starting a build for commit id: 5b002c807b9b8c1b5591e01bd3473c47e6eefb7d
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): CustomGenome_0.99.1.tar.gz macOS 12.7.1 Monterey: CustomGenome_0.99.1.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/CustomGenome to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 17a2b45aa9cd1500f950ee794216c1253d217ad1
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: CustomGenome_0.99.2.tar.gz Linux (Ubuntu 22.04.2 LTS): CustomGenome_0.99.2.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/CustomGenome to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: a20b98307ed8f4bd284a7a691d337a4e190a7fe5
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): CustomGenome_0.99.3.tar.gz macOS 12.7.1 Monterey: CustomGenome_0.99.3.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/CustomGenome to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 4ad4318a3cef1df2d87ea278507046de3e0a0851
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): CustomGenome_0.99.4.tar.gz macOS 12.7.1 Monterey: CustomGenome_0.99.4.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/CustomGenome to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 41a09577a658c8b6e5783c78b3dad502377a79e2
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: CustomGenome_0.99.5.tar.gz Linux (Ubuntu 22.04.2 LTS): CustomGenome_0.99.5.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/CustomGenome to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 1d973511bf3aa102b4e5d390c63a4b2bf0b18ffa
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): CustomGenome_0.99.6.tar.gz macOS 12.7.1 Monterey: CustomGenome_0.99.6.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/CustomGenome to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 036cc2b9ac273b5ec21b03509e6bafce23782da3
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): CustomGenome_0.99.7.tar.gz macOS 12.7.1 Monterey: CustomGenome_0.99.7.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/CustomGenome to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
@lshep I'm not sure how to get around these errors. The main problem with them is that sometimes a file downloads incomplete and then the checksum fails, but sometimes this works fine (as is the case here: https://github.com/Bioconductor/Contributions/issues/3224#issuecomment-1870657807)
What do I need to do to get this to pass, other than removing more test checks that work fine on my local devices?
Do you implement any sort of caching so that if a file works it doesnt have to be redownloaded the next time? See BiocFileCache
I don't, but the code does check to see if the file has already been downloaded to the temp folder. I think the main issue is the sheer size of these files.
Using BiocFileCache wouldn't fix this problem if I understand correctly, since if I run this N times over N push/commits
path <- tempfile()
bfc <- BiocFileCache(path, ask = FALSE)
bfcadd(bfc, "genome", fpath="https://www.ensembl.com/big/file.fasta.gz")
bfcadd(bfc, "othergenome", fpath="https://www.ensembl.com/big/file2.fasta.gz")
## later:
mygenome1 <- bfcquery(bfc, "genome")
mygenome2 <- bfcquery(bfc, "othergenome")
it would still initialise N different independent caches and still have to initially download all files that the start of each test.
This doesn't change much from my initial code which downloads to a fixed folder for genomes and re-uses the resources over subsequent calls.
Is there anything else I can do?
If the size of the files are large then caching will be a necessity and likely a requirement by a reviewer. I would recommend a package specific directory that we general allow in cases like this. See UniProt.ws example:
cache <- tools::R_user_dir("UniProt.ws", "cache")
bfc <- BiocFileCache(cache, ask=FALSE)
But also be very clear to the user how much space they will need to run example data / vignettes before the start -- understanding that it can be variable once used with other data.
Received a valid push on git.bioconductor.org; starting a build for commit id: 864159a746289e7ca5b23bf12a1ea6b6c15b7115
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.2 LTS): CustomGenome_0.99.8.tar.gz macOS 12.7.1 Monterey: CustomGenome_0.99.8.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/CustomGenome to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.