MassBank-data icon indicating copy to clipboard operation
MassBank-data copied to clipboard

Curation of entries related to mixtures

Open tsufz opened this issue 6 years ago • 2 comments

Some "compounds" we measure are not single compounds, but mixtures of isomers or similar compounds. However, often the mixture is reported, for example Nystatin, which contains Nystatin A1, A2 and A3.

An example is https://massbank.eu/MassBank/jsp/RecordDisplay.jsp?id=EQ314001&dsn=Eawag The name is Nystatin (the mixture), but the shown compound is Nystatin A1 ``https://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID80872323related tohttps://comptox.epa.gov/dashboard/dsstoxdb/results?search=DTXSID80872323#related-substances`.

For the measurement also proxy compounds are used (for example in case of surfactant mixtures with homologes or nonylphenol).

However, from a pure data science / machine view point this relation is wrong without addional information that the given compound is a proxy. In PubChem also only the proxy is given, DTX is better, of course.

Therefore, we should implent a structure to handle this situation:

  1. Add a mixture tag which include a link to a external source (PubChem / DTX)
  2. Curate records with mixtures in order to represent the correct compound used for the mass spectra generation (e.g. not Nystatin, but Nystatin A1)
  3. Insert the mixture tag to those records

tsufz avatar Dec 14 '18 09:12 tsufz