MassBank-data
MassBank-data copied to clipboard
Have GitHub and Zenodo releases synchronized
Hi,
Thank your for all your effort put in MassBank! I was trying to access its data and realized https://github.com/MassBank/MassBank-data/releases and https://doi.org/10.5281/zenodo.3378723 are not synchrone.
This can be easily done by following https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content.
This way, each GitHub release ends up archived on Zenodo and having its DOI automatically.
Hope this makes sense!
Thank you for bringing this to our attention. An automatic procedure should be in place, but apparently its not working atm. I will look into this.
I just checked and didn't found any differences. Could you please explain a little bit more of your finding? What I did:
- Downloaded the zip from zenodo: https://zenodo.org/record/8014263/files/MassBank/MassBank-data-2023.06.zip?download=1
- Downloaded the release artifact from github: https://github.com/MassBank/MassBank-data/archive/refs/tags/2023.06.zip
- unziped and compared
- diff shows no differences on my system
Wow, this is a fast reply!
I actually found the different json/sql/msp
files available in the releases/tag/2023.06 very convenient and they do not seem to appear on Zenodo, but maybe I missed something?
P.S.: Is there any reason for having an sql
and no sqlite
which would make it directly readable by MsBackendMassbank? (Or did I miss something again here?)
Yes, you are right. Zenodo only covers the txt files. Thats a result of the automatic zenodo release procedure of github. I dont know how to automatically attach the other release artifacts to the zenodo release.
For your second question I have no answer atm. The sql file is released for the MsBackendMassbank package, but we did not put too much effort into it. Its basically the dump of our internal data structure. Maybe this sql file needs to be processed to an sqlite file? I need to do some research. Maybe @jorainer didnt want to create additional workload on our side? I found that script: https://github.com/rformassspectrometry/MsBackendMassbank/blob/main/inst/scripts/massbank-to-sqlite.R. If thats the case we can probably modify our scripts to create the sqlite artifact instead of the sql file.
👍🏼 The different "ready-to-use" files would be a plus on Zenodo (I also don't know how to attach artifacts to Zenodo releases automatically...will search a bit and come back if I find something). I was also using the nice script of @jorainer, and we are probably many out there to do so...so generating the sqlite directly would probably indeed add some work on your side, but avoid it being replicated many times elsewhere.
Note: my preferred way to access/use MassBank data in R is through AnnotationHub
:
library(AnnotationHub)
ah <- AnnotationHub()
query(ah, "MassBank")
AnnotationHub with 3 records
# snapshotDate(): 2023-06-23
# $dataprovider: MassBank
# $species: NA
# $rdataclass: CompDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH107048"]]'
title
AH107048 | MassBank CompDb for release 2021.03
AH107049 | MassBank CompDb for release 2022.06
AH111334 | MassBank CompDb for release 2022.12.1
So, as for now there are these 3 releases available through AnnotationHub
. To use one of them:
mb <- ah[["AH107049"]]
mb
class: CompDb
data source: MassBank
version: 2022.06
organism: NA
compound count: 90190
MS/MS spectra count: 90190
This CompDb
can be used directly with Spectra
(i.e. Spectra(mb)
would get you all MS2 spectra). Besides being available through AnnotationHub
, the resource (sqlite file) gets also locally cached. So, first time downloaded, and any subsequent use will load it from the local cache.
There's however a manual step involved - since I need to convert the MassBank data structures into a CompDb
SQLite (using this script) and then also to upload and maintain these releases in Bioconductor's AnnotationHub
... but I think that this should simplify usage of MassBank in R tremendously. Long term goal is to provide also other annotation resources (as CompDb
?) through AnnotationHub
...