mixs icon indicating copy to clipboard operation
mixs copied to clipboard

New term proposal : isotopolog

Open mslarae13 opened this issue 2 years ago • 14 comments

New term details For us to assess a new term request we require the following details:

Term name - isotopolog
Structured comment name - isotopolog
Definition - Isotopologue (isotope source/substrate/molecule) added to the biological sample. List the PubChem Compound Identification (CID) number, or, if an undefined mixture, a short description. If more than one isotopologue was used in this sample, use a pipe to delimit each isotopologue, and ensure all isotopologue- describing fields describe all isotopologues in this order.
Expected value - text or CID
Value syntax - {termLabel} {[termID]}|{text}
Example - toluene [pubchem.compound:1140] or toluene [pubchem.compound:1140] | water [pubchem.compound:962] or root exudates
Preferred unit - NA
Package(s) - new checklist term applicable to all packages / extensions

Additional context Add any other context about the new term here.

mslarae13 avatar Jun 02 '23 19:06 mslarae13

Term name - isotopologue Structured comment name - isotopologue Definition - Isotopologue (isotope source/substrate/molecule) added to the biological sample. List the PubChem Compound Identification (CID) number, or, if an undefined mixture, a short description. If more than one isotopologue was used in this sample, use a pipe to delimit each isotopologue, and ensure all isotopologue-describing fields describe all isotopologues in this order. Expected value - text or CID Value syntax - {termLabel} {[termID]}|{text} Example - 1140 or cellulose or 1140 | cellulose Preferred unit - NA

simpso91 avatar Jun 04 '23 18:06 simpso91

@simpso91 is 1140 a different compound? Or is it the CID for cellulose?

mslarae13 avatar Jun 05 '23 22:06 mslarae13

Yes, 1140 is a different compound (toluene)

We could use 962 (water) instead of cellulose so it would look like: (three different examples) 1140 or 1140 | 962 or root exudates

simpso91 avatar Jun 06 '23 21:06 simpso91

@only1chunts , @ramonawalls and I are working hard to get the various components of a term definition to be fully compatible with one another. Can we work through that for this term, and possibly other terms in your new package?

The Value syntax is '{termLabel} {[termID]}|{text}' but the examples are

  • 1140
  • cellulose
  • 1140 | cellulose

All of those are valid as text, but free text is obviously the worst choice if you want the data you and others submit to be FAIR

Pubchem CIDs are also mentioned, presumably as a more controlled namespace. That's great! Is that what '1140 | cellulose' is supposed to be an example of? A {termLabel} {[termID]} example could be 'cellulose [pubchem.compound:1140]' (as redirected by identifiers.org and bioregistry)

turbomam avatar Jun 06 '23 21:06 turbomam

I really think we should update the issue tempalte for new term requests to clarify the interrelatedness of the various term attributes.

turbomam avatar Jun 06 '23 21:06 turbomam

Thank you @turbomam ! In that case, the examples should be:

  • toluene [pubchem.compound:1140]
  • toluene [pubchem.compound:1140] | water [pubchem.compound:962]
  • root exudates

Also, @mslarae13 in the manuscript (because I had to shorten all the structured comment names anyway), we are moving towards using the American spelling "isotopolog" instead of "isotopologue". Could this entry be changed to reflect this?

simpso91 avatar Jun 06 '23 21:06 simpso91

@turbomam - I have added a line to the issue tickets templates requesting addition of relationship to other terms. I had hoped that sort of detail would be included under the "additional context" section, but I guess its better to be explicit.

only1chunts avatar Jun 07 '23 07:06 only1chunts

In the interest of simplifying this field, Roli proposed having this field accept only numerical values:

  • 1140
  • 962
  • 962 | 1140 With chemicals that do not have a pubchem ID being entered as "0", with a text description in an additional column. Would this be acceptable? We are worried about people misspelling "[pubchem.compound: ]" or otherwise incorrectly inputting this term.

simpso91 avatar Jun 26 '23 13:06 simpso91

@simpso91 what is the additional column? We can't add columns depending on another column. The same reason we can't has isotopolog_1, isotopolog_2 ... etc depending on how many there are. You have to pipe them.

So we would have to make this column the numerical value... and have another column for the name. Making the name column optional.

The misspelling when a pubchem ID doesn't exist is a very valid concern.

mslarae13 avatar Jul 07 '23 19:07 mslarae13

Discussed If not in pubchecm it's hard to bioinformatically worked with. Add a link for how to go to pubmed and fill out your ID

mslarae13 avatar Aug 03 '23 20:08 mslarae13

Update from publication, "0" for not a registered compound

mslarae13 avatar Jun 23 '25 19:06 mslarae13

If we omit a prefix it's pretty much guaranteed that people will enter numbers for other kinds of pubchem entities, e.g https://pubchem.ncbi.nlm.nih.gov/substance/1140

or even chembl IDs or chebi IDs...

cmungall avatar Jun 23 '25 20:06 cmungall

@cmungall your comment is unclear. Are you saying that even though this requires pubchem we should include the prefix? Or are you saying that we should allow multiple prefixes?

I understand your point that people are likely to complete the field incorrectly, but I don't see from this comment what you're suggesting as an alternative approach.

mslarae13 avatar Jul 11 '25 18:07 mslarae13

Sorry, I should have been more clear. Yes, my proposal is to mandate a prefix (i.e. pubchem.compound:). And to avoid sentinel values (e.g. special meaning for 0)

cmungall avatar Jul 24 '25 21:07 cmungall