MassBank-data
MassBank-data copied to clipboard
Curation of entries related to mixtures
Some "compounds" we measure are not single compounds, but mixtures of isomers or similar compounds. However, often the mixture is reported, for example Nystatin, which contains Nystatin A1, A2 and A3.
An example is https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=EQ314001&dsn=Eawag
The name is Nystatin (the mixture), but the shown compound is Nystatin A1 ``https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID80872323related to
https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID80872323#related-substances`.
For the measurement also proxy compounds are used (for example in case of surfactant mixtures with homologes or nonylphenol).
However, from a pure data science / machine view point this relation is wrong without addional information that the given compound is a proxy. In PubChem also only the proxy is given, DTX is better, of course.
Therefore, we should implent a structure to handle this situation:
- Add a mixture tag which include a link to a external source (PubChem / DTX)
- Curate records with mixtures in order to represent the correct compound used for the mass spectra generation (e.g. not Nystatin, but Nystatin A1)
- Insert the mixture tag to those records