CompoundDb
CompoundDb copied to clipboard
License issues
From @stanstrup on October 19, 2017 9:1
- Which databases can I include data from?
- If there are ones I cannot they will need to be download and table generated by the user. Is there such a thing as "in-package cache"?
- Which license can the package have if it includes db data?
- Is license a concern at all? As far as I know data cannot be copyrighted so is there any concern at all?
MONA (lipidblast) is CC BY 4.0. So should be OK? http://mona.fiehnlab.ucdavis.edu/documentation/license- I cannot find what license lipidmaps have.
- Seems hmdb require explicit permission to include http://www.hmdb.ca/downloads
The info I extract is: id, name, inchi, formula, and mass..
For the moment I force-removed the files until this is settled.
Copied from original issue: stanstrup/PeakABro#1
From @egonw on October 19, 2017 13:31
I don't think LipidMaps is Open Data. Wikidata is, PubChem is.
From @chasemc on October 19, 2017 22:45
"LMSD lipid structures are deposited into PubChem database (http://pubchem.ncbi.nlm.nih.gov/) periodically and a link to PubChem substance ID (SID) is also maintained within LMSD. Access to complete set of LMSD lipid structures in PubChem is available at www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pcsubstance&term=LipidMAPS[sourcename])."
@chasemc thanks! That is very useful info. So I might be able to get around that one by just including PubChem and leave the indicator to lipidmaps so that you can eventually filter for the lipidmaps compounds.
@chasemc It seems the source is only in the SID entries. Not the CIDs. However the lipidmaps ids have been added as a name so it is possible to filter by those prefixes.
From @egonw on October 22, 2017 9:20
@chasemc also note that PubChem is not formally Open Data: it mixes their own public domain data with copyrighted upstream material. Legally, this is quite hard to untangle.
Generally, just contact LipidMaps and ask if it is OK to index their structures in the table as you want to do, and if you are allowed to make that available under terms compatible with the license of the R package.
For LipidMaps, a subset of about 1400 lipids is available under CCZero from Wikidata: http://tinyurl.com/ycbm9gfq
Thanks! I already contacted LipidMaps. Waiting for an answer.
@egonw what do you mean by upstream material? All the calculated properties? Si if I only use basic info as name and inchi it should be ok?