Contributions icon indicating copy to clipboard operation
Contributions copied to clipboard

MsExperiment submission

Open lgatto opened this issue 3 years ago • 19 comments

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

  • Repository: https://github.com/RforMassSpectrometry/MsExperiment/

Confirm the following by editing each check box to '[x]'

  • [x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.

  • [x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.

  • [x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.

  • [x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.

  • [x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.

  • [x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.

  • [x] I am familiar with the Bioconductor code of conduct and agree to abide by it.

I am familiar with the essential aspects of Bioconductor software management, including:

  • [x] The 'devel' branch for new packages and features.
  • [x] The stable 'release' branch, made available every six months, for bug fixes.
  • [x] Bioconductor version control using Git (optionally via GitHub).

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

lgatto avatar Jul 17 '22 16:07 lgatto

Hi @lgatto

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: MsExperiment
Title: Infrastructure for Mass Spectrometry Experiments
Version: 0.99.0
Description: Infrastructure to store and manage all aspects related to
   a complete proteomics or metabolomics mass spectrometry (MS) experiment. The
   MsExperiment package provides light-weight and flexible containers for MS
   experiments building on the new MS infrastructure provided by the Spectra,
   QFeatures and related packages. Along with raw data representations, links
   to original data files and sample annotations, additional metadata or
   annotations can also be stored within the MsExperiment container. To
   guarantee maximum flexibility only minimal constraints are put on the type
   and content of the data within the containers.
Authors@R: c(person(given = "Laurent", family = "Gatto",
          comment = c(ORCID = "0000-0002-1520-2268"),
          email = "[email protected]",
          role = c("aut", "cre")),
   person(given = "Johannes", family = "Rainer",
          comment = c(ORCID = "0000-0002-6977-7147"),
          email = "[email protected]",
          role = "aut"),
   person(given = "Sebastian", family = "Gibb",
          comment = c(ORCID = "0000-0001-7406-4443"),
          email = "[email protected]",
          role = "aut"))
Depends:
    R (>= 4.2),
    ProtGenerics (>= 1.9.1),
Imports:
    methods,
    S4Vectors,
    IRanges,
    Spectra,
    SummarizedExperiment,
    QFeatures
Suggests:
    testthat,
    knitr (>= 1.1.0),
    roxygen2,
    BiocStyle (>= 2.5.19),
    rmarkdown,
    rpx,
    mzR,
    msdata
License: Artistic-2.0
LazyData: no
VignetteBuilder: knitr
BugReports: https://github.com/RforMassSpectrometry/MsExperiment/issues
URL: https://github.com/RforMassSpectrometry/MsExperiment
biocViews: Infrastructure, Proteomics, MassSpectrometry, Metabolomics, ExperimentalDesign, DataImport
RoxygenNote: 7.2.0
Roxygen: list(markdown=TRUE)
Collate:
    'MsExperiment-functions.R'
    'MsExperimentFiles.R'
    'MsExperiment.R'
    'existMsExperimentFiles.R'

bioc-issue-bot avatar Jul 17 '22 16:07 bioc-issue-bot

While building vignette by hand I hit

/home/stvjc/.cache/R/rpx
  does not exist, create directory? (yes/no): yes

I wonder if the vignette should fail at this point, directing the user to deal with rpx cache -- which does not seem standard for BiocFileCache, should it use BiocFileCache?

vjcitn avatar Jul 22 '22 14:07 vjcitn

That looks like a standard R caching directory -- likely there is a flag for creating this automatically when it is not manually run (ie. an interactive vs non interactive setting) -- or if there isn't there should be

lshep avatar Jul 22 '22 14:07 lshep

> R_user_dir("cache")
[1] "/home/stvjc/.local/share/R/cache"

seems more standard to me. Also

label: unnamed-chunk-23
Quitting from lines 316-319 (MsExperiment.Rmd) 
Error in .local(object, ...) : 
  Parameter 'files' is mandatory for 'MsBackendMzR'

... and named chunks are nice ... :)

vjcitn avatar Jul 22 '22 14:07 vjcitn

It's the standard location for package caching:

> tools::R_user_dir(package = "rpx", which = "cache")
[1] "/home/lgatto/.cache/R/rpx"

rpx (and its cached) is used to download files from the PRIDE mass spectrometry database.

I'm surprised it fails when building the vignette, as the function should ask only in an interactive environment:

> rpx::rpxCache
function () 
{
    cache <- tools::R_user_dir(package = "rpx", which = "cache")
    BiocFileCache::BiocFileCache(cache, ask = interactive())
}
<bytecode: 0x55d2ec9e6140>
<environment: namespace:rpx>

Sorry about the chunk names - I will fix these.

lgatto avatar Jul 22 '22 18:07 lgatto

it didn't fail, it just paused during my manual compilation of vignette. as programmed. it just doesn't happen very often.

As for cache location, ignore my comment but I expected to see .local/share ... as part of the path. My problem.

vjcitn avatar Jul 22 '22 18:07 vjcitn

It did fail however at unnamed chunk 23.

vjcitn avatar Jul 22 '22 18:07 vjcitn

I had a quick look and the error complains about an empty filenames vector, that should have been created in the previous (unnamed) chunk:

fls <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)
basename(fls)

experimentFiles(lmse) <- MsExperimentFiles(
    mzML_files = fls,
    annotations = "internal_standards.txt")

producing

> fls <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)
> basename(fls)
[1] "20171016_POOL_POS_1_105-134.mzML" "20171016_POOL_POS_3_105-134.mzML"

The msdata is in the Suggests - could it be possible that it isn't available on your system?

lgatto avatar Jul 22 '22 18:07 lgatto

Sorry for so many bumps. I thought

Quitting from lines 158-160 (MsExperiment.Rmd) 
Error: processing vignette 'MsExperiment.Rmd' failed with diagnostics:
error in evaluating the argument 'object' in selecting a method for function 'backendInitialize': The use of 'MsBackendMzR' requires package 'mzR'. Please install with 'BiocInstaller::install("mzR")'
--- failed re-building ‘MsExperiment.Rmd’

SUMMARY: processing the following file failed:
  ‘MsExperiment.Rmd’

was odd. I will keep pushing.

vjcitn avatar Aug 04 '22 18:08 vjcitn

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot avatar Aug 08 '22 14:08 bioc-issue-bot

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/MsExperiment to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Aug 08 '22 14:08 bioc-issue-bot

Hi @nturaga - the error in the build report is due to an URL that seemed temporarily inaccessible (it works now). The first code chunk (other than loading packages) pulls data from an online repo using the rpx package. Could you trigger a new build, or do I need to push a version bump?

lgatto avatar Aug 10 '22 11:08 lgatto

You need to push a version bump @lgatto

nturaga avatar Aug 10 '22 15:08 nturaga

Weird, I pushed a version bump (plus code chunk labels) but it didn't triggered the spb. Any idea @nturaga?

lgatto avatar Aug 11 '22 07:08 lgatto

Did you push to upstream at git.bioconductor.org? When I cloned the package the version bump was not there?

lshep avatar Aug 11 '22 12:08 lshep

Received a valid push on git.bioconductor.org; starting a build for commit id: 9e5f85252c102b0188d67ce9440274907e8bbbae

bioc-issue-bot avatar Aug 11 '22 14:08 bioc-issue-bot

Indeed, thank you @lshep, and sorry for missing the obvious reason.

lgatto avatar Aug 11 '22 14:08 lgatto

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/MsExperiment to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Aug 11 '22 14:08 bioc-issue-bot

Hi Laurent,

Taking a first look at this, here is some immediate feedback:

  1. Imports:

    • Importing selectively from the methods package is strongly discouraged. Note that you already import the whole methods package (import(methods)) so there's actually no need to selectively import things with importFrom(methods,"slot<-"), importFrom(methods,as), importMethodsFrom(methods,show), etc...
    • Same situation with ProtGenerics: you have import(ProtGenerics) so no need to importFrom(ProtGenerics,spectra).
    • I would also strongly suggest that you import the whole S4Vectors and IRanges, just to avoid unnecessary maintenance complications in the future.
  2. I suggest that you extend the Annotated class from S4Vectors. This will give you the metadata() getter and setter for free, and also will ensure that they behave consistently across all objects in the Bioconductor ecosystem.

  3. Given that the assay slot contains "quantification data" and that the slot is accessed via the qdata() getter and setter, why isn't the slot called qdata?

  4. Can I suggest that you use the preferred _OR_ convention for all your union classes? E.g. MsExperimentFiles_OR_NULL instead of MsExperimentFilesOrNull, and QFeatures_OR_SummarizedExperiment_OR_NULL instead of QFeaturesOrSummarizedExperimentOrNull, etc... BTW I'm not sure why you need the MsExperimentFilesOrNull class at all. Couldn't an empty (i.e. zero-length) MsExperimentFiles object be used when there are no files to store?

  5. For a MsExperiment object mse with 2 samples, I was expecting that mse[ , 1:2] would be a no-op but I get:

    > mse
    Object of class MsExperiment 
     Files: mzML_files, annotations 
     Spectra: MS1 (1862) 
     Experiment data: 2 sample(s)
     Sample data links:
      - experimentFiles.mzML_files: 2 sample(s) to 2 element(s).
      - experimentFiles.annotations: 2 sample(s) to 1 element(s).
      - spectra: 2 sample(s) to 1862 element(s).
    > identical(mse[,1:2], mse)
    [1] FALSE
    

    This makes me wonder if I really understand what subsetting does.

  6. I also wonder why enforce the use of the 2D syntax for subsetting (e.g. mse[ , 1]) instead of just using the simpler 1D syntax (e.g. mse[1]). This kind of suggests that MsExperiment objects are somehow 2D objects but they are not (dim() returns NULL). Have you considered giving these objects a 1D semantic? This would be achieved by: (a) giving them a length() (the number of samples), (b) allowing them to carry names() (the names/ids of the samples), and (c) allowing subsetting with mse[i] where i is either an integer vector (index), or a logical vector of the same length as mse, or a character vector of sample names/ids.

  7. Files created with saveRDS() are RDS files and thus should have the .rds extension. Looks like you're using .rda in the vignette (and maybe in other places).

  8. Some cosmetic:

    • Make sure to use `r Biocpkg("BiocManager")` instead of bare backticks for package names in the text of the vignette.
    • Right now the show() method prints Object of class MsExperiment for an empty MsExperiment object. Would be nice if it could tell that the object is empty, like you did for QFeatures objects.

Thanks, H.

hpages avatar Aug 18 '22 21:08 hpages

Received a valid push on git.bioconductor.org; starting a build for commit id: 49ce72383208ec6379a221a5023b5b6b8bc16d37

bioc-issue-bot avatar Sep 16 '22 14:09 bioc-issue-bot

Hi @hpages

Thank you for your insightful comments and suggestions. Here's how we have addressed them:

  1. Imports:
    • Importing selectively from the methods package is strongly discouraged. Note that you already import the whole methods package (import(methods)) so there's actually no need to selectively import things with importFrom(methods,"slot<-"), importFrom(methods,as), importMethodsFrom(methods,show), etc...
    • Same situation with ProtGenerics: you have import(ProtGenerics) so no need to importFrom(ProtGenerics,spectra).
    • I would also strongly suggest that you import the whole S4Vectors and IRanges, just to avoid unnecessary maintenance complications in the future.

The global imports of methods and ProtGenerics were a mistake and have been removed. We don't mind selectively importing symbols from S4Vectors and IRanges - hope that's OK.

  1. I suggest that you extend the Annotated class from S4Vectors. This will give you the metadata() getter and setter for free, and also will ensure that they behave consistently across all objects in the Bioconductor ecosystem.

Done.

> showClass("MsExperiment")
Class "MsExperiment" [package "MsExperiment"]

Slots:
                                                
Name:                            experimentFiles
Class:                         MsExperimentFiles
                                                
Name:                                    spectra
Class:                           Spectra_OR_Null
                                                
Name:                                      qdata
Class: QFeatures_OR_SummarizedExperiment_OR_Null
                                                
Name:                                  otherData
Class:                                      List
                                                
Name:                                 sampleData
Class:                                 DataFrame
                                                
Name:                            sampleDataLinks
Class:                                      List
                                                
Name:                                   metadata
Class:                                      list

Extends: "Annotated"
  1. Given that the assay slot contains "quantification data" and that the slot is accessed via the qdata() getter and setter, why isn't the slot called qdata?

Indeed, we have changed as suggested:

> slotNames("MsExperiment")
[1] "experimentFiles" "spectra"         "qdata"           "otherData"      
[5] "sampleData"      "sampleDataLinks" "metadata"       
  1. Can I suggest that you use the preferred _OR_ convention for all your union classes? E.g. MsExperimentFiles_OR_NULL instead of MsExperimentFilesOrNull, and QFeatures_OR_SummarizedExperiment_OR_NULL instead of QFeaturesOrSummarizedExperimentOrNull, etc... BTW I'm not sure why you need the MsExperimentFilesOrNull class at all. Couldn't an empty (i.e. zero-length) MsExperimentFiles object be used when there are no files to store?

We are now following the _OR_ convention and got rid of MsExperimentFilesOrNull as it wasn't needed.

  1. For a MsExperiment object mse with 2 samples, I was expecting that mse[ , 1:2] would be a no-op but I get:
    > mse
    Object of class MsExperiment 
     Files: mzML_files, annotations 
     Spectra: MS1 (1862) 
     Experiment data: 2 sample(s)
     Sample data links:
      - experimentFiles.mzML_files: 2 sample(s) to 2 element(s).
      - experimentFiles.annotations: 2 sample(s) to 1 element(s).
      - spectra: 2 sample(s) to 1862 element(s).
    > identical(mse[,1:2], mse)
    [1] FALSE
    
    This makes me wonder if I really understand what subsetting does.

See below for the list semantic.

As for the non-identical mseand mse[1:2] objects, this is expected, as we need to copy filenames that are pointed to by more than one sample to guarantee the consistency between samples and their associated files. This has the side effect that you point out. This was mentioned/demonstratd in the vignette and now also in the man page.

  1. I also wonder why enforce the use of the 2D syntax for subsetting (e.g. mse[ , 1]) instead of just using the simpler 1D syntax (e.g. mse[1]). This kind of suggests that MsExperiment objects are somehow 2D objects but they are not (dim() returns NULL). Have you considered giving these objects a 1D semantic? This would be achieved by: (a) giving them a length() (the number of samples), (b) allowing them to carry names() (the names/ids of the samples), and (c) allowing subsetting with mse[i] where i is either an integer vector (index), or a logical vector of the same length as mse, or a character vector of sample names/ids.

You are absolutely right, and we have changed the semantic accordingly:

> mse
Object of class MsExperiment 
 Files: mzML_files, annotations 
 Experiment data: 2 sample(s)
 Sample data links:
  - experimentFiles.mzML_files: 2 sample(s) to 2 element(s).
  - experimentFiles.annotations: 2 sample(s) to 1 element(s).
> length(mse)
[1] 2
> mse[1]
Object of class MsExperiment 
 Files: mzML_files, annotations 
 Experiment data: 1 sample(s)
 Sample data links:
  - experimentFiles.mzML_files: 1 sample(s) to 1 element(s).
  - experimentFiles.annotations: 1 sample(s) to 1 element(s).
  1. Files created with saveRDS() are RDS files and thus should have the .rds extension. Looks like you're using .rda in the vignette (and maybe in other places).

This was a silly mistake - fixed now.

  1. Some cosmetic:
    • Make sure to use `r Biocpkg("BiocManager")` instead of bare backticks for package names in the text of the vignette.
    • Right now the show() method prints Object of class MsExperiment for an empty MsExperiment object. Would be nice if it could tell that the object is empty, like you did for QFeatures objects.

Both fixed.

> MsExperiment()
Empty object of class MsExperiment 

Thank you!

Laurent & Johannes

lgatto avatar Sep 16 '22 14:09 lgatto

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to [email protected]:packages/MsExperiment to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot avatar Sep 16 '22 14:09 bioc-issue-bot

Excellent! Thanks for all the changes.

Cheers, H.

hpages avatar Sep 20 '22 20:09 hpages

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

bioc-issue-bot avatar Sep 20 '22 20:09 bioc-issue-bot

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/lgatto.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("MsExperiment"). The package 'landing page' will be created at

https://bioconductor.org/packages/MsExperiment

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

lshep avatar Sep 26 '22 13:09 lshep