CompoundDb icon indicating copy to clipboard operation
CompoundDb copied to clipboard

Add code/description how to create a CompDb from MassBank

Open jorainer opened this issue 5 years ago • 7 comments

MassBank releases their databases at regular intervals and shares the data with a rather open license, which makes them an ideal candidate for annotation databases that could be distributed via Bioconductor's AnnotationHub.

Explanation: I'm building so called EnsDb databases for all species for each release of Ensembl. These databases are self-contained SQLite files with gene, transcript, exon and protein annotations and can be downloaded/fetched from AnnotationHub. This is very convenient for the user.

CompDb databases could be distributed in a similar fashion.

What I will try next is to define simple scripts to easily import data from the MassBank (MySQL database) into a CompDb database.

jorainer avatar Oct 21 '20 07:10 jorainer

Is there an advantage to this compared to using the SDF from MoNA?

stanstrup avatar Oct 21 '20 07:10 stanstrup

I can not say for the content. What I like about the MassBank is that a) the license is pretty clear, so data can be (re)shared, b) MassBank makes releases, which allows to "freeze" the data - important for reproducible research and c) extracting the data directly from their database is easier than importing from text files (SDF and/or json).

jorainer avatar Oct 21 '20 07:10 jorainer

OMG - did not expect that. So, MassBank has one compound for each spectrum. Far from being a normalized database :(

jorainer avatar Oct 21 '20 10:10 jorainer

Yes, and the IDs differ between the different labs. Only common thing could be the InChIKey to cross-map, but never tried so far.

michaelwitting avatar Oct 21 '20 10:10 michaelwitting

Problem is that not all compounds have an inchikey - which makes it then really tricky. Well, for now I will import the data as is.

jorainer avatar Oct 21 '20 11:10 jorainer

Do all of them have a SMILES? Then the InChIKey could be calculated with this one: https://github.com/CDK-R/rinchi

michaelwitting avatar Oct 21 '20 11:10 michaelwitting

Indeed - it seems that all of them have SMILES. Good point - maybe you could chime in here too: https://github.com/MassBank/MassBank-web/issues/266

jorainer avatar Oct 21 '20 11:10 jorainer