Contributions
Contributions copied to clipboard
`rd4` initial submission
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
- Repository: https://github.com/fulcrumgenomics/rd4
Confirm the following by editing each check box to '[x]'
-
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
-
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
-
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
-
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
-
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
-
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
-
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
- [x] The 'devel' branch for new packages and features.
- [x] The stable 'release' branch, made available every six months, for bug fixes.
- [x] Bioconductor version control using Git (optionally via GitHub).
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.
Hi @pamelarussell
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: rd4
Title: R Bindings for Reading and Querying Dense Depth Data Dump (D4) Files
Version: 0.99.0
Authors@R: c(
person("Pamela", "Russell", ,
"[email protected]", role = c("aut", "cre")),
person("Seth", "Stadick", , "[email protected]", role = c("aut"))
)
Description: The Dense Depth Data Dump (D4) format provides fast analysis and
compact storage of quantitative genomics datasets. rd4 provides R bindings
for reading and querying D4 files. For full details on the format, see Hou
et al. (https://doi.org/10.1038/s43588-021-00085-0).
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.0
Depends:
GenomicRanges,
wrapr
Suggests:
knitr,
rmarkdown,
data.table,
testthat (>= 3.0.0)
SystemRequirements: GNU make
biocViews: DataRepresentation, DataImport, Sequencing, Coverage
VignetteBuilder: knitr
Is this really necessary when using/checking the package:
* building 'rd4_0.99.0.tar.gz'
1/11 packages newly attached/loaded, see sessionInfo() for details.
* installing to library '/home/stvjc/R-4-2-bc316-dist/lib/R/library'
* installing *source* package 'rd4' ...
** using staged installation
** libs
rm -Rf rd4.so ./rust/target/release/librd4.a entrypoint.o
gcc -I"/home/stvjc/R-4-2-bc316-dist/lib/R/include" -DNDEBUG -I/usr/local/include -fpic -g -O2 -c entrypoint.c -o entrypoint.o
# In some environments, ~/.cargo/bin might not be included in PATH, so we need
# to set it here to ensure cargo can be invoked. It is appended to PATH and
# therefore is only used if cargo is absent from the user's PATH.
export PATH="/home/stvjc/perl5/bin:/home/stvjc/GCLOUD/google-cloud-sdk/bin:/home/stvjc/.local/bin:/home/stvjc/bin:/home/stvjc/perl5/bin:/home/stvjc/GCLOUD/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/stvjc/.dotnet/tools:/home/stvjc/bin:/home/stvjc/.local/bin:/home/stvjc/bin:/home/stvjc/.local/bin:/home/stvjc/.cargo/bin" && \
cargo build --lib --release --manifest-path=./rust/Cargo.toml --target-dir ./rust/target
Compiling libc v0.2.127
Compiling autocfg v1.1.0
Compiling cfg-if v1.0.0
Compiling once_cell v1.13.0
Compiling proc-macro2 v1.0.43
Compiling unicode-ident v1.0.3
Compiling quote v1.0.21
Compiling log v0.4.17
Compiling memchr v2.5.0
...
I like the rest of the dev core have no experience with rust but I would hope that one could configure the system in a reasonably persistent way for repeated use of the package. Is this not the case?
Hi @vjcitn, this comes from code that is auto-generated by rextendr and only executes during rd4 installation. This adds cargo to the PATH (if it isn't already, which is not typical) within the active shell so required crates can be fetched. The modified PATH does not persist outside that shell. Hope this answers the concern; please let me know if not.
@vjcitn I'll add rust to the linux builder and see how it goes. I'll follow up when it's done.
Rust is installed. That was pretty quick! I didn't do any special configurations, though.
That's great @jwokaty . Can we also see if rextendr from CRAN will install? I will continue working on this submission, it is not ready to go into build system but if rextendr installs ok we will be ready.
rextendr installed without issue.
The vignette needs work. See http://contributions.bioconductor.org/important-bioconductor-package-development-features.html?q=vignette#bioc-vignette
Thanks @vjcitn , I have updated the vignette per the guidelines.
Hi @pamelarussell ,
rd4 requires Rust and Cargo to be present on the system so this needs to be reported in the DESCRIPTION file in the SystemRequirements field. On Ubuntu these can easily be installed with sudo apt-get install rustc cargo so we will do this on our Linux builder.
In addition to listing the external system requirements in the SystemRequirements field, we ask package contributors to add a small INSTALL file in the top-level folder of their package where they explain how to install those system requirements for the various OS's that they wish to support (typically Debian-based and RedHat-based systems, Windows, and Mac). Ideally the file should provide simple commands that the user can copy/paste on their machine to do the job (e.g. sudo apt-get install rustc cargo on a Debian-based system).
Thanks, H.
@vjcitn @jwokaty It doesn't seem that rd4 actually depends or needs CRAN package rextendr. My understanding is that rextendr was used at some point by the developers of rd4 to generate static R code that is included in the package. A little bit in the same way that developers use Autoconf to generate a configure file. Not something that the end user needs.
@pamelarussell
The src/python/requirements.txt file in your package contains the following:
numpy==1.22.3
pyd4==0.3.6
Does this mean that rd4 also requires Python and those Python modules to be available on the user machine? If that's the case then these requirements should also be listed in SystemRequirements.
Thanks, H.
@hpages thanks, I have updated SystemRequirements and added INSTALL. I have also removed src/python which is not needed.
You are correct that rextendr was used in the development process and is not needed for package installation.
Thanks! Please make sure to remove any material that is not needed from the package.
@hpages I have gone through again and there is no other unneeded material in the repo.
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/rd4 to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
I'm looking into this.
@pamelarussell Can I ask you to change your INSTALL such that the installation instructions use sudo apt-get install rustc cargo? I installed this way and was able to install without issue. We also typically install with apt-get, so we recommend this method instead.
# Results from nebbiolo2:
R CMD build --keep-empty-dirs --no-resave-data rd4
* checking for file ‘rd4/DESCRIPTION’ ... OK
* preparing ‘rd4’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* installing the package to build vignettes
* creating vignettes ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘rd4_0.99.0.tar.gz’
@pamelarussell As mentioned earlier people should preferably use sudo apt-get install rustc cargo on an Ubuntu/Debian system. Also please provide the yum command that they should use on an RPM-based Linux system. Once you're done, please bump the version of the package to trigger a new build. Thanks!
@hpages I have made these updates and bumped the version. For RPM, I have recapitulated the recommendation to use yum update -y to update the system followed by the curl-based rustup method.
Hi @pamelarussell,
-
Please provide the exact
yumcommand that people should use to install Rust and Cargo on an RPM-based Linux systemYou'll need to figure out the names of the corresponding
rpmpackages. Likeapt-geton Debian-based Linux systems,yumis the standard/uniform way to install additional software on an RPM-based Linux system. This is by far the easiest/cleanest/safest way to add or remove software so is what everybody should preferably use unless there is a good reason to use something else. Besides, what would be the point of updating the database of available software withyum update -yif yum is not going to be used to install the software? -
The same applies to GNU Make (note the proper capitalization)
Nobody should ever need to manually download/compile/install GNU Make from a source tarball obtained at https://ftp.gnu.org/gnu/make/. Instead, people should always use the version provided with their system. On Linux, you can safely assume that people using R/Bioconductor already have it (like they already have a C/C++ compiler), otherwise they wouldn't be able to install most CRAN or Bioconductor packages. Anyways, it seems that the only reason GNU Make is listed in
SystemRequirements:is to avoid anR CMD checkwarning that otherwise would be triggered because some Makefiles that get processed during installation use GNU extensions. For this particular case, you don't need to provide detailed installation instructions in yourINSTALLfile. However, if you really want to show people how to get GNU Make, then please provide the standard command (i.e.apt-get install makeoryum install make). On Mac, people should obtain themakecommand by installing Xcode or the Apple's Command Line Tools (a subset of Xcode). -
Simplify the package installation process
When I install the package I see the following output:
* installing to library ‘/home/hpages/R/R-4.2.r82318/library’ * installing *source* package ‘rd4’ ... ** using staged installation ** libs rm -Rf rd4.so ./rust/target/release/librd4.a entrypoint.o /usr/bin/gcc -I"/home/hpages/R/R-4.2.r82318/include" -DNDEBUG -I/usr/local/include -fpic -O3 -c entrypoint.c -o entrypoint.o # In some environments, ~/.cargo/bin might not be included in PATH, so we need # to set it here to ensure cargo can be invoked. It is appended to PATH and # therefore is only used if cargo is absent from the user's PATH. export PATH="/home/hpages/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/home/hpages/.cargo/bin" && \ cargo build --lib --release --manifest-path=./rust/Cargo.toml --target-dir ./rust/target Updating git repository `https://github.com/38/d4-format` Updating git repository `https://github.com/extendr/extendr` Updating crates.io index Downloaded futures-task v0.3.24 Downloaded socket2 v0.4.6 Downloaded futures-util v0.3.24 Downloaded futures-channel v0.3.24 Downloaded futures-sink v0.3.24 Downloaded futures-io v0.3.24 Downloaded futures-core v0.3.24 Downloaded 7 crates (275.6 KB) in 1.13s Compiling libc v0.2.132 Compiling autocfg v1.1.0 Compiling cfg-if v1.0.0 Compiling once_cell v1.13.1 Compiling proc-macro2 v1.0.43 Compiling unicode-ident v1.0.3 Compiling quote v1.0.21 Compiling log v0.4.17 Compiling memchr v2.5.0 Compiling cc v1.0.73 Compiling pin-project-lite v0.2.9 Compiling syn v1.0.99 Compiling bytes v1.2.1 Compiling futures-core v0.3.24 Compiling itoa v1.0.3 Compiling glob v0.3.0 Compiling version_check v0.9.4 Compiling untrusted v0.7.1 Compiling spin v0.5.2 Compiling crossbeam-utils v0.8.11 Compiling futures-task v0.3.24 Compiling fnv v1.0.7 Compiling quick-error v1.2.3 Compiling unicode-width v0.1.9 Compiling regex-syntax v0.6.27 Compiling futures-util v0.3.24 Compiling serde_derive v1.0.144 Compiling serde v1.0.144 Compiling bindgen v0.55.1 Compiling strsim v0.8.0 Compiling hashbrown v0.12.3 Compiling vec_map v0.8.2 Compiling httparse v1.7.1 Compiling futures-sink v0.3.24 Compiling rustls v0.20.6 Compiling futures-io v0.3.24 Compiling libR-sys v0.3.0 Compiling ansi_term v0.12.1 Compiling futures-channel v0.3.24 Compiling matches v0.1.9 Compiling tinyvec_macros v0.1.0 Compiling pin-utils v0.1.0 Compiling termcolor v1.1.3 Compiling bitflags v1.3.2 Compiling rayon-core v1.9.3 Compiling cfg-if v0.1.10 Compiling lazy_static v1.4.0 Compiling percent-encoding v2.1.0 Compiling rustc-hash v1.1.0 Compiling lazycell v1.3.0 Compiling scopeguard v1.1.0 Compiling peeking_take_while v0.1.2 Compiling shlex v0.1.1 Compiling try-lock v0.2.3 Compiling httpdate v1.0.2 Compiling crc32fast v1.3.2 Compiling encoding_rs v0.8.31 Compiling pkg-config v0.3.25 Compiling tower-service v0.3.2 Compiling unicode-bidi v0.3.8 Compiling ryu v1.0.11 Compiling adler v1.0.2 Compiling base64 v0.13.0 Compiling ppv-lite86 v0.2.16 Compiling serde_json v1.0.85 Compiling extendr-engine v0.3.1 (https://github.com/extendr/extendr?branch=master#c6929ebe) Compiling ipnet v2.5.0 Compiling either v1.8.0 Compiling extendr-api v0.3.1 (https://github.com/extendr/extendr?branch=master#c6929ebe) Compiling mime v0.3.16 Compiling smallvec v1.9.0 Compiling paste v1.0.8 Compiling tracing-core v0.1.29 Compiling tokio v1.20.1 Compiling slab v0.4.7 Compiling memoffset v0.6.5 Compiling indexmap v1.9.1 Compiling crossbeam-epoch v0.9.10 Compiling rayon v1.5.3 Compiling num-traits v0.2.15 Compiling libloading v0.7.3 Compiling http v0.2.8 Compiling nom v5.1.2 Compiling humantime v1.3.0 Compiling textwrap v0.11.0 Compiling clang-sys v1.3.3 Compiling ring v0.16.20 Compiling tinyvec v1.6.0 Compiling form_urlencoded v1.0.1 Compiling miniz_oxide v0.5.3 Compiling rustls-pemfile v1.0.1 Compiling tracing v0.1.36 Compiling http-body v0.4.5 Compiling want v0.3.0 Compiling unicode-normalization v0.1.21 Compiling aho-corasick v0.7.18 Compiling atty v0.2.14 Compiling which v3.1.1 Compiling num_cpus v1.13.1 Compiling socket2 v0.4.6 Compiling mio v0.8.4 Compiling getrandom v0.2.7 Compiling memmap v0.7.0 Compiling crossbeam-channel v0.5.6 Compiling flate2 v1.0.24 Compiling clap v2.34.0 Compiling idna v0.2.3 Compiling regex v1.6.0 Compiling rand_core v0.6.3 Compiling d4-framefile v0.3.6 (https://github.com/38/d4-format?rev=871de45#871de457) Compiling ordered-float v2.10.0 Compiling url v2.2.2 Compiling cexpr v0.4.0 Compiling rand_chacha v0.3.1 Compiling env_logger v0.7.1 Compiling crossbeam-deque v0.8.2 Compiling tokio-util v0.7.3 Compiling rand v0.8.5 Compiling webpki v0.22.0 Compiling sct v0.7.0 Compiling h2 v0.3.14 Compiling webpki-roots v0.22.4 Compiling extendr-macros v0.3.1 (https://github.com/extendr/extendr?branch=master#c6929ebe) Compiling tokio-rustls v0.23.4 Compiling hyper v0.14.20 Compiling hyper-rustls v0.23.0 Compiling d4-hts v0.3.6 (https://github.com/38/d4-format?rev=871de45#871de457) Compiling serde_urlencoded v0.7.1 Compiling reqwest v0.11.11 Compiling d4 v0.3.6 (https://github.com/38/d4-format?rev=871de45#871de457) Compiling rd4 v0.1.0 (/home/hpages/pkgreviews/rd4.review/rd4/src/rust) Finished release [optimized] target(s) in 2m 07s /usr/bin/gcc -shared -L/home/hpages/R/R-4.2.r82318/lib -L/usr/local/lib -o rd4.so entrypoint.o -L./rust/target/release -lrd4 -L/home/hpages/R/R-4.2.r82318/lib -lR installing to /home/hpages/R/R-4.2.r82318/library/00LOCK-rd4/00new/rd4/libs ** R ** inst ** byte-compile and prepare package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded from temporary location ** checking absolute paths in shared objects and dynamic libraries ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (rd4)That is a lot of stuff that gets downloaded and compiled! Are you sure that all these things are needed for normal/proper use of the package? For example AFAIK extendr was only needed during development to generate static R code so I wonder why it shows up here during a standard end-user install. I also see that core OS components like
libcandccget downloaded and compiled. What for? No package should ever need to do that.Please make sure to simplify rd4 installation process so that only what's actually needed gets downloaded/compiled.
Please address and bump the version again (not sure why your earlier version bump didn't trigger a build, make sure to push your changes to [email protected]:packages/rd4 in addition to GitHub).
Thanks, H.
Just a note to confirm that I see the same compilation process, but I did not have any downloads on my installation attempt. Perhaps a cache is checked. Apropos libc and cc, these are name collisions -- not core OS components, but see, e.g., https://crates.io/crates/cc
Received a valid push on git.bioconductor.org; starting a build for commit id: c29f6e0268bb54f9d3eeadcaab208ccbdd7e4664
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/rd4 to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
@hpages I have updated the yum commands to install Rust and Cargo per your instructions, and removed install instructions for GNU Make.
A couple of points regarding the installation of multiple Rust crates:
- The Rust community prefers small, tightly focused crates, hence the large number of transitive dependencies even for a relatively small package.
- Note the distinction between
rextendr, the R package that was used in development and has been removed from DESCRIPTION, vs.extendr, a Rust crate that is still needed for compilation.
Thanks for the clarifications and the changes to the INSTALL file. Looks like we finally got a successful install/build/check on our Linux builder :smiley:
Note that we strongly advice against installing Bioconductor packages from GitHub as this generally results in "version mismatches" and incompatibilities for people using the current release version of Bioconductor (most people are). This is because installing from GitHub will get them the devel version of a package which has no guarantee to be compatible with the release version of the Bioconductor ecosystem. Please remove section 2.2 from the vignette.
Here is some feedback about the functionality provided by the package:
-
A
print()method for D4Source objects would be nice. Right now, printing such object provides no valuable information about what's in it:> d4source <pointer: 0x55f05a358370> attr(,"class") [1] "D4Source" "D4SourceEnv"At a very minimum, it could print the path to the file.
-
Returning the chromosome names and lengths in a nested list is not "the R way":
> chrs <- get_chroms(d4source) > str(chrs[1:3]) List of 3 $ :List of 2 ..$ name: chr "chr1" ..$ size: int 248956422 $ :List of 2 ..$ name: chr "chr2" ..$ size: int 242193529 $ :List of 2 ..$ name: chr "chr3" ..$ size: int 198295559For this kind of data "the R way" is to use a named integer vector or a 2-column data.frame (a standard data.frame, not a data.table object). For even better integration to the Bioconductor ecosystem, maybe consider providing a
seqinfo()getter that returns the chromosome and genome information in a Seqinfo object. -
Please remove reliance on the data.table package. It is not needed.
-
The use of "0-based, half-open region coordinates" for the low-level D4Source extractors (e.g.
query(),mean.D4Source(),median.D4Source()) is very unfortunate. The Bioconductor ecosystem uses the "1-based closed intervals" convention everywhere, as established in the IRanges/GenomicRanges packages. Introducing the "0-based, half-open region coordinates" convention for those extractors will confuse the end user and will be error prone. -
The names of the
update_*()functions (update_mean,update_median, etc...) is a little bit awkward given what they do. They don't update the mean or median. Instead they compute the mean or median from the supplied D4Source object for each range in the supplied GRanges object, and store the results in a metadata column that gets added to the latter. So what actually gets updated is the GRanges object. Maybe names likeupdate_with_mean,update_with_median, etc... would help convey what these functions really do? In addition, it's generally expected that a function that updates something would take this something as its first argument. -
The
update_*()functions should store the mean/median/etc... in atomic metadata columns instead of metadata columns of type list:> granges GRanges object with 2 ranges and 5 metadata columns: seqnames ranges strand | mean median percentile_25 <Rle> <IRanges> <Rle> | <list> <list> <list> [1] chr1 17027501-17027600 + | 2.48 0 0 [2] chr1 17027701-17027800 + | 42.23 41 37 percentile_50 percentile_75 <list> <list> [1] 0 4 [2] 41 50 ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths -
Finally how about providing an analog to
rtracklayer::import.bed(),rtracklayer::import.bedGraph(), orrtracklayer::import.bw(), for D4 files or D4Source objects? The ability to import the full content of a D4 file as a GRanges object would be incredibly useful as it would make it possible to use a D4 file as a drop-in replacement for bed, bedGraph, or BigWig, in any Bioconductor workflow where these types of files are currently used!
Thanks, H.
Received a valid push on git.bioconductor.org; starting a build for commit id: 96ae1f69200a4d92edf7d16d65172a14a90b0656
@hpages thanks for the useful feedback. We have addressed all of these in the latest update.
- Remove instructions to install from GitHub
print()method for D4Source class- Replace
get_chroms()withseqinfo()getter - Remove
data.tabledependency - Replace 0-based half-open coordinates with 1-based closed coordinates
- Rename
update_*()methods and move object being updated to first argument update_*()methods store the metadata in atomic columns instead of list columnsrtracklayer::import.*()analog:to_granges()function to convert D4Source data to a GRanges object
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
[email protected]:packages/rd4 to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.