Contributions icon indicating copy to clipboard operation
Contributions copied to clipboard

HiContactsData

Open js2264 opened this issue 1 year ago • 2 comments

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

  • Repository: https://github.com/js2264/HiContactsData

Confirm the following by editing each check box to '[x]'

  • [x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.

  • [x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.

  • [x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.

  • [x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.

  • [x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.

  • [x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.

  • [x] I am familiar with the Bioconductor code of conduct and agree to abide by it.

I am familiar with the essential aspects of Bioconductor software management, including:

  • [x] The 'devel' branch for new packages and features.
  • [x] The stable 'release' branch, made available every six months, for bug fixes.
  • [x] Bioconductor version control using Git (optionally via GitHub).

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

js2264 avatar Aug 29 '22 14:08 js2264

Hi @js2264

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: HiContactsData
Title: HiContacts companion data package
Version: 0.99.0
Date: 2022-08-16
Authors@R: 
    person(given = "Jacques",
 family = "Serizay",
 role = c("aut", "cre"),
 email = "[email protected]")
Description: 
    Provides a collection of Hi-C files (pairs and (m)cool). These datasets can 
    be read into R and further investigated and visualized with the HiContacts 
    package. Data includes yeast Hi-C data generated by the Koszul lab from 
    the Pasteur Institute.
License: MIT + file LICENSE
URL: https://github.com/js2264/HiContactsData
BugReports: https://github.com/js2264/HiContactsData/issues
Depends:
    ExperimentHub
Imports: 
    BiocFileCache, 
    AnnotationHub
Suggests:
    HiContacts,
    testthat, 
    methods,
    knitr, 
    rmarkdown
biocViews: 
    ExperimentHub,
    ExperimentData, 
    SequencingData
Encoding: UTF-8
VignetteBuilder: knitr
LazyData: false
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.1

bioc-issue-bot avatar Aug 29 '22 14:08 bioc-issue-bot

FYI, I am planning on submitting HiContacts (https://github.com/js2264/HiContacts) as an AdditionalPackage, as soon as the data package has been assigned a reviewer (according to following guidelines: https://github.com/Bioconductor/Contributions/blob/master/CONTRIBUTING.md#submitting-related-packages).

js2264 avatar Sep 01 '22 09:09 js2264

Sorry for the delay

vjcitn avatar Sep 11 '22 16:09 vjcitn

Hi @vjcitn no worries for the delay. I guess we're now waiting for a reviewer to be assigned? Is there anything else I can do to make things go smoothly?

js2264 avatar Sep 12 '22 12:09 js2264

@lshep ... @js2264 already noted

Do not submit an AdditionalPackage with the line shown in step 3 until a review in progress tag has been added to your package and your first package receives a build report

Submit additional packages to the same issue. Do this by posting a comment containing a line like:

 AdditionalPackage: https://github.com/username/repositoryname

vjcitn avatar Sep 12 '22 12:09 vjcitn

so you'll have to wait a bit until the ingestion occurs, often at week's end.

vjcitn avatar Sep 12 '22 12:09 vjcitn

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot avatar Sep 12 '22 15:09 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/HiContactsData to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Sep 12 '22 15:09 bioc-issue-bot

AdditionalPackage: https://github.com/js2264/HiContacts

js2264 avatar Sep 12 '22 17:09 js2264

Hi @js2264, Thanks for submitting your additional package: https://github.com/js2264/HiContacts. We are taking a quick look at it and you will hear back from us soon.

bioc-issue-bot avatar Sep 12 '22 17:09 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: dfd940262c106e30b7177b95aaeeaf2073b2c402

bioc-issue-bot avatar Sep 12 '22 17:09 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/HiContactsData to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Sep 12 '22 18:09 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: 1c5b038a3989a64aade2cd962bc4a8abdf6b02be

bioc-issue-bot avatar Sep 14 '22 09:09 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/HiContactsData to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Sep 14 '22 09:09 bioc-issue-bot

Hi Jaques @js2264

Thank you for your submission. Please see the review below for the data package. Feel free to respond via the comments.

Best, Marcel


HiContactsData #2757

DESCRIPTION

  • Looks good.
  • Consider using BiocStyle.

NAMESPACE

  • HiContactsDataFiles may not be necessary, see below.

R

  • It would be better to encode the data from the conditional statements into a data.frame. It can contain several columns, e.g., sample, format, AHID, genome, condition and description.
  • You can then use the inputs from the function to subset the data.frame and get the AHID.
  • You can add a dry.run argument to allow the user to see the data.frame from HiContactsDataFiles when it is FALSE.
  • Please do not use rm -rf command in the code. We cannot accept a package that does this. You may not need the extension for the function that reads it in, e.g. in rhdf5.

vignettes

  • Show the library(HiContactsData) call in the vignette, you can remove the noise with message=FALSE,warning=FALSE,include=TRUE,results="hide" in the chunk options.

LiNk-NY avatar Sep 15 '22 15:09 LiNk-NY

Additional Package has been approved for building.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

bioc-issue-bot avatar Sep 15 '22 16:09 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Sep 15 '22 16:09 bioc-issue-bot

Hi Marcel @LiNk-NY,

Thank you for your review. I have addressed your comments in the coming series of commits. The only thing I do not quite understand is your reference to using BiocStyle for DESCRIPTION. Could you please clarify what changes you'd want me to do ? I have originally generated the DESCRIPTION from biocthis::use_bioc_description()and added useful entries to it.

DESCRIPTION

  • Looks good.
  • Consider using BiocStyle.

Thanks again, J

js2264 avatar Sep 15 '22 16:09 js2264

Hi Jaques, @js2264 I mean using BiocStyle for the vignette and adding it to the Suggests field in the DESCRIPTION. See here for more details. Best, Marcel

LiNk-NY avatar Sep 15 '22 16:09 LiNk-NY

Received a valid push on git.bioconductor.org; starting a build for commit id: 6a72314b73c7bd9bebe9a1dea548e262db6a5f2e

bioc-issue-bot avatar Sep 15 '22 17:09 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: ca988e900ba08d4584c81939436216352b3d80be

bioc-issue-bot avatar Sep 15 '22 17:09 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Sep 15 '22 17:09 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/HiContactsData to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Sep 15 '22 17:09 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: bf75c79e0ed75b7d9fd9c3e8c10856cd41e8d50c

bioc-issue-bot avatar Sep 15 '22 18:09 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Sep 15 '22 18:09 bioc-issue-bot

Received a valid push on git.bioconductor.org; starting a build for commit id: 7b2c4ab6b6be3b5b056fe782ed70ef8abf7f3f2e

bioc-issue-bot avatar Sep 15 '22 21:09 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/HiContacts to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Sep 15 '22 21:09 bioc-issue-bot

Hi Jacques, @js2264

Thank you for your submission. Please see the review of the software package below.

Best regards, Marcel


HiContacts #2757

You've put a lot of work into the package. It may be best to divide the package into two based on the functionality provided. One package could host all the visualization functionality and the other the computational work. This would allow users to choose the needed functionality and avoid the high number of dependencies. The design of the class is good but may need some pruning of slots and methods.

DESCRIPTION

  • The data package should be in the Suggests field.
  • Consider doing a pkgndep analysis to reduce the number of dependencies in the package.

NAMESPACE

  • Avoid exporting internal symbols such as .summary. Objects that start with a . (dot) are usually meant to be unexported.
  • You likely do not need to import HiContactsData.

vignettes

  • Looks good.
  • The subset operations seem a little unweildy but perhaps manageable for new users; consider creating a helper function to facilitate the operation.
  • There is a mix of camelCase and snake_case use. Is there a difference between uses? Otherwise, it's best to keep the function names consistently named.
  • Remove any empty sections in the vignette.

R

  • Keep a consistent representation, i.e., GenomicRanges thoughout (virtual4C). The tibble may be useful for plotting but other tools might make more use of a GRanges object.
  • Include a description in the documentation for all functions and avoid using acronyms for function names (e.g., APA).
  • Parallelized functions must use one core by default. These functions get run on the BBS and they cannot be greedy with the resource. The BBS builds multiple packages at a time.
  • Avoid the use of @ in code (e.g., APA). Create a standard interface with replacement methods.
  • There are multiple versions of APA, e.g., APA_, APA2. It is not clear what their purpose is.
  • You really don't need to use %>% <- tidyr::%>%, use the native |> pipe.
  • Where possible avoid using numeric indices, e.g., contacts_list[[1]] and use more robust named indices. Minor:
  • Consider reducing the cyclomatic complexity of R/checks.R by removing conditions that are redundant for example, grepl(' x ', focus(contacts)) already returns TRUE and FALSE in is_centered.
  • Your inputs should match the slot types in the constructor function, e.g., metadata = list(), features = SimpleList(), etc.
  • Document and export generics and methods.
  • The path generic clashes with the path generic in BiocIO consider using a different name. Avoid relying on an element in the metadata as the metadata usually does not have a structure or checks and create a more formal slot for the class.
  • scores is too similar to score in BiocGenerics, consider using the latter.
  • features usually refers to row wise annotations, to avoid confusion either use a different name or consider using something like rowData/rowRanges. The extractor returns a SimpleList but the replacement method accepts GRangesOrGInteractions? This is not consistent.
  • The data documentation belongs in the HiContactsData package.
  • Avoid function names with numbers e.g., getCounts2. Is there a getCounts1?
  • Set verbose to FALSE by default.

tests

It's good that you have tests. Consider adding tests for those with 0%.

covr::package_coverage()
HiContacts Coverage: 38.11%
R/APA.R: 0.00%
R/scalogram.R: 0.00%
R/parse.R: 4.62%
R/utils.R: 7.75%
R/contacts.R: 21.83%
R/checks.R: 40.00%
R/plotMatrix.R: 48.26%
R/ggthemes.R: 50.00%
R/Ps.R: 78.57%
R/arithmetics.R: 78.79%
R/cistrans.R: 95.65%
R/4C.R: 100.00%

LiNk-NY avatar Sep 16 '22 15:09 LiNk-NY

Hi Marcel @LiNk-NY

Thank you for your thorough review and useful comments. I have turned your comments into a check list. The items that are still unticked have further explanations below. Let me know if anything is not clear and what I should do (including additional required changes ?).

## DESCRIPTION

  • [x] The data package should be in the Suggests field.
  • [x] Consider doing a pkgndep analysis to reduce the number of dependencies in the package.

I used pkgndep and dstr to identify as many packages as possible to remove. I got rid of purrr, dropped BiocParallel (since it was only used in APA function which is not ready yet and requires significant work to finish it, so I'm currently developing it in another branch). I also rewrote a bit my code to remove zeallot. I believe all the other dependencies are definitely required.

## NAMESPACE

  • [x] Avoid exporting internal symbols such as .summary. Objects that start with a . (dot) are usually meant to be unexported.
  • [x] You likely do not need to import HiContactsData.

## Vignettes

  • [x] The subset operations seem a little unweildy but perhaps manageable for new users; consider creating a helper function to facilitate the operation.
  • [ ] There is a mix of camelCase and snakecase use. Is there a difference between uses? Otherwise, it's best to keep the function names consistently named.

I've corrected most of it, except for the internal check functions. I usually have check function names as snake_make to differentiate them from other functions. These check functions systematically return boolean values and are mostly used internally, at the beginning of functions. Is it ok to keep them in snake_make? Otherwise I can change them to camelCase, just let me know.

  • [x] Remove any empty sections in the vignette.

## R

  • [x] Keep a consistent representation, i.e., GenomicRanges thoughout (virtual4C). The tibble may be useful for plotting but other tools might make more use of a GRanges object.
  • [x] Include a description in the documentation for all functions and avoid using acronyms for function names (e.g., APA).
  • [x] Parallelized functions must use one core by default. These functions get run on the BBS and they cannot be greedy with the resource. The BBS builds multiple packages at a time.
  • [x] Avoid the use of @ in code (e.g., APA). Create a standard interface with replacement methods.
  • [x] There are multiple versions of APA, e.g., APA_, APA2. It is not clear what their purpose is.
  • [x] You really don't need to use %>% <- tidyr::%>%, use the native |> pipe.
  • [ ] Where possible avoid using numeric indices, e.g., contacts_list[[1]] and use more robust named indices.

I have made some changes when possible (e.g. in anchors(x)[[1]] -> anchors(x)[['first']]). But regarding contacts_list[[1]], I am not sure how to change that to something else. I've added a check to make sure that at least 2 elements are contained in the contacts_list argument. Do you think that is enough, or would you want additional changes?

## Minor:

  • [x] Consider reducing the cyclomatic complexity of R/checks.R by removing conditions that are redundant for example, grepl(' x ', focus(contacts)) already returns TRUE and FALSE in is_centered.
  • [x] Your inputs should match the slot types in the constructor function, e.g., metadata = list(), features = SimpleList(), etc.
  • [x] Document and export generics and methods.
  • [x] The path generic clashes with the path generic in BiocIO consider using a different name. Avoid relying on an element in the metadata as the metadata usually does not have a structure or checks and create a more formal slot for the class.
  • [ ] scores is too similar to score in BiocGenerics, consider using the latter.

I thought about using score name. However, AFAIK, score in BiocGenerics generally returns a vector. In HiContacts, scores returns a GInteractions object with, for each interaction, an associated score. All the functions in the package are built on scores() and it would be a lot of work to modify it to behave in a similar way to score(). I can make that change, of course, if you think it is a requirement, but since this was in your minor comments I thought I'd ask before starting all of this.

  • [ ] features usually refers to row wise annotations, to avoid confusion either use a different name or consider using something like rowData/rowRanges. The extractor returns a SimpleList but the replacement method accepts GRangesOrGInteractions ? This is not consistent.

I don't think changing features to rowData/rowRanges would be appropriate. In this package, features really refers to genomic features (could be chromatin loops, TSSs, compartment borders, whatever the user wants). A user might be interested in analysing a contact matrix in the light of these features and the features slot is a way to "enrich" the contacts object with these. These genomic features are not necessarily related to the interactions themselves, and there is no direct correspondance between "rows" (i.e. interactions, here) and features, as it is the case in SummarizedExperiments and co. Finally, AFAIK, I can only see 2 contexts in which "features" is frequently used in Bioconductor: either GenomicFeatures package or "feature selection" (typically in scRNAseq); these are very specific contexts and I believe people studying chromatin architecture would not be mistaken. With that said, do let me know if you think "features" still needs to be renamed and I'll happily find something else!

  • [ ] The data documentation belongs in the HiContactsData package.

The data.R documentation describes contacts objects shipped with HiContacts (see data folder). I thought I would put these objects in HiContacts rather than HiContactsData since they are contacts objects and thus require HiContacts. Should I move them to HiContactsData?

  • [x] Avoid function names with numbers e.g., getCounts2. Is there a getCounts1?
  • [x] Set verbose to FALSE by default.

js2264 avatar Sep 18 '22 11:09 js2264

Hi Marcel @LiNk-NY Just to let you know I have made some comments in the post above, there are few points that would need an input from you when you have a minute. Many thanks again!

js2264 avatar Sep 19 '22 17:09 js2264