bigg_models
bigg_models copied to clipboard
Call for feedback: New lipid naming convention
There is a nice proposal from @michaelwitting at WormJam for a systematic naming convention for lipids. It's described here:
https://github.com/JakeHattwell/wormjam/issues/11
This convention would generate nice-looking BiGG IDs. A couple examples:
Phosphatidylcholines:
1,2-diacylglycerophosphocholine --> 1ac2acg3pc (old pchol)
1-acylglycerophosphocholine --> 1acg3pc
2-acylglycerophosphocholine --> 2acg3pc
1-alkyl-2-acylglycerophosphocholine --> 1alk2acg3pc
1-alkylglycerophosphocholine --> 1alkg3pc
1-alkenyl-2-acylglycerophosphocholine --> 1alken2acg3pc
1-alkenylglycerophosphocholine --> 1alkeng3pc
If we adopt this, we will probably do it through the standard BiGG process: we will not remove old IDs (so pchol will stay), and new IDs will come in as we add models to BiGG. However, we can provide an extra level of checking to help users adopt these new IDs.
Tagging some BiGG watchers who might have feedback on this: @nel3 @neemajamshidi @matthiaskoenig @draeger @rmtfleming @phantomas1234 @willigott @cdanielmachado @smoretti @djinnome @cnorsig @jmcconn @jtyurkovich
Sounds like a good plan to me.
It seems reasonable. A question remains: Will old and new IDs coexist in the future? Would BiGG list the old IDs then as legacy identifiers and store them in a separate table?
@draeger In this case, we will not replace any old IDs. Just add new ones.
I don't like that this are not valid SBML identifiers, also it should be made clear that these are all triglycerols, i.e. using a glycerol backbone with possible three connections. There are other backbones which allow a wide range of connections, by being too specific here these other things cannot be encoded in a uniform manner)
I would highly recommend to prefix these to make the ids more clear, valid (SBML) and usable:
1,2-diacylglycerophosphocholine --> tg1ac2acg3pc (old pchol)
1-acylglycerophosphocholine --> tg1acg3pc
2-acylglycerophosphocholine --> tg2acg3pc
1-alkyl-2-acylglycerophosphocholine --> tg1alk2acg3pc
1-alkylglycerophosphocholine --> tg1alkg3pc
1-alkenyl-2-acylglycerophosphocholine --> tg1alken2acg3pc
1-alkenylglycerophosphocholine --> tg1alkeng3pc
I think added the triglycerol here causes confusion with triacylglycerols. The name already contains g3pc, which is the ID for glycero-3-phosphocholine, the 1ac and 2ac state that it has an additional acyl group at positions 1 and 2.
It will make it much easier to work with all triacylglycerols, if these have a common prefix because you can just filter the subset based on the prefix and listing the remaining 3 chains, i.e., than I even understand the rules for creating all possibilities:
-
- start with
tgprefix
- start with
-
- list the up to 3 chains starting with the respective number of the connection, 1 is hereby the ... C atom (clarify this for chirality, i.e. which one is the 1); if there is no connection at one of the 3C leave it out. Order from 1 to 3.
1,2-diacylglycerophosphocholine --> tg1ac2ac3pc (old pchol)
1-acylglycerophosphocholine --> tg1ac3pc
2-acylglycerophosphocholine --> tg2ac3pc
1-alkyl-2-acylglycerophosphocholine --> tg1alk2ac3pc
1-alkylglycerophosphocholine --> tg1alk3pc
1-alkenyl-2-acylglycerophosphocholine --> tg1alken2ac3pc
1-alkenylglycerophosphocholine --> tg1alken3pc
By the way also super easy to parse and no dependency if the pc is on 1 or 3.
I see your points, but not all of them are tri-acyl-glycerols. Based on lipid biochemistry the glycerol-backbone is fixed to be sn-glycero-3-phosphate (coming from the synthesis).
We could do it the other way round, having the lipid class in front and then the chain configuration, e.g. pc1ac2ac.
Ideally the nomenclature should be consistent (at least in part) with the nomenclature used in the lipidomics field.
Only listing the chains without a prefix will create problems if there is only one modification, which for instance would than be pc1 or 1pc which is probably already used as id for other things. By having a clear prefix the namespace becomes unique.
Also there are other backbones which would require a prefix, e.g. the sphingolipids, which then have to be something like
sp2ac3pc to disinguish from 2ac3pc, so why not name it something like g2ac3pc or tg2ac3pc, then it is clear from the id what the backbone is.
Then I would go for version with g instead of tg, which avoids confusion with real triacylglycerols. Sphingolipids will become a bit more tricky in that regard, because there are several backbones possible. Rules for encoding the backbone would be need. For example C. elegans uses C17iso sphingoid bases, which are not found in mammals.
I will think about different ways for the phospholipids.
Made up my mind. We should go for the prefix. Works also well with some IDs that are already in BiGG.
A example for glycero- and glycerophospholipids:
g3pc and g3pe are already in BiGG for sn-glycero-phosphocholine and sn-glycero-phosphoethanolamine. g1ac3pc would then represent a 1-acyl-sn-glycero-phosphocholine etc.
I will prepare a table with the "old" IDs and the new systematic ones. I would anyway need this for our WormJam model.
Here is a table with examples from the WormJam model.
| Class | Metabolite | Old / Wrong / Duplicated ID (WormJam) | Correct / New ID |
|---|---|---|---|
| MG | 1-acyl-sn-glycerol | 1magol | g1ac |
| MG | 2-acyl-sn-glycerol | mag | g2ac |
| MG-O | 1-alkyl-sn-glycerol | --- | g1alk |
| MG-P | 1-(Z)-alk-1-enyl-sn-glycerol | alkenglyc | g1alken |
| DG | 1,2-diacyl-sn-glycerol | 12dag | g1ac2ac |
| DG-O | 1-alkyl-2-acyl-sn-glycerol | akac2g | g1alk2ac |
| DG-P | 1-(Z)-alk-1-enyl-2-acyl-glycerol | alkenac2g | g1alken2ac |
| TG | Triacyl-glycerol | tag | g1ac2ac3ac |
| TG-O | 1-alkyl-2,3-diacylglycerol | --- | g1alk2ac3ac |
| TG-P | 1-(Z)-alk-1-enyl-2,3-diacylglycerol | --- | g1alken2ac3ac |
| DHAP | 1-acylglycerone 3-phosphate | Adhap | dhap1ac |
| DHAP-O | 1-alkylglycerone 3-phosphate | akdhap | dhap1alk |
| PA | 1,2-diacyl-sn-glycero-3-phosphate | pa_pl | g1ac2ac3p |
| PA | 1,2-diacyl-sn-glycero-3-phosphate | 12dag3p | g1ac2ac3p |
| LPA | 1-acyl-sn-glycero-3-phosphate | alpa | g1ac3p |
| LPA | 1-acyl-sn-glycero-3-phosphate | alpa_tag | g1ac3p |
| LPA | 1-acyl-sn-glycero-3-phosphate | 1ag3p_SC | g1ac3p |
| LPA | 2-acyl-sn-glycero-3-phosphate | --- | g2ac3p |
| LPA-O | 1-alkyl-sn-glycero-3-phosphate | alkgp | g1alk3p |
| PA-O | 1-alkyl-2-acyl-sn-glycero-3-phosphate | akac2gp | g1alk2ac3p |
| LPA-P | 1-(Z)-alk-1-enyl-sn-glycero-3-phosphate | --- | g1alken3p |
| PA-P | 1-(Z)-alk-1-enyl-2-acyl-sn-glycero-3-phosphate | --- | g1alken2ac3p |
| PC | 1,2-diacyl-sn-glycero-3-phosphocholine | pchol | g1ac2ac3pc |
| LPC | 1-acyl-sn-glycero-3-phosphocholine | ag3pc | g1ac3pc |
| LPC | 2-acyl-sn-glycero-3-phosphocholine | 2agpc | g2ac3pc |
| PC-O | 1-alkyl-2-acyl-sn-glycero-3-phosphocholine | akac2gchol | g1alk2ac3pc |
| LPC-O | 1-alkyl-sn-glycero-3-phosphocholine | ak2lgchol | g1alk3pc |
| PC-P | 1-(Z)-alk-1-enyl-2-acyl-sn-glycero-3-phosphocholine | --- | g1alken2ac3pc |
| LPC-P | 1-(Z)-alk-1-enyl-sn-glycero-3-phosphocholine | --- | g1alken3pc |
| PE | 1,2-diacyl-sn-glycero-3-phosphoethanolamine | pe | g1ac2ac3pe |
| PE | 1,2-diacyl-sn-glycero-3-phosphoethanolamine | pe_BAC | g1ac2ac3pe |
| LPE | 1-acyl-sn-glycero-3-phosphoethanolamine | acg3pe | g1ac3pe |
| LPE | 2-acyl-sn-glycero-3-phosphoethanolamine | --- | g2ac3pe |
| PE-O | 1-alkyl-2-acyl-sn-glycero-3-phosphoethanolamine | akac2gpe | g1alk2ac3pe |
| LPE-O | 1-alkyl-sn-glycero-3-phosphoethanolamine | --- | g1alk3pe |
| PE-P | 1-(Z)-alk-1-enyl-2-acyl-sn-glycero-3-phosphoethanolamine | alkenac2gpe | g1alken2ac3pe |
| LPE-P | 1-(Z)-alk-1-enyl-sn-glycero-3-phosphoethanolamine | alken2gpe | g1alken3pe |
| PS | 1,2-diacyl-sn-glycero-3-phospho-L-serine | ps | g1ac2ac3ps |
| LPS | 1-acyl-sn-glycero-3-phospho-L-serine | acg3ps | g1ac3ps |
| LPS | 2-acyl-sn-glycero-3-phospho-L-serine | --- | g2ac3ps |
| PI | 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol | pail | g1ac2ac3pi |
| LPI | 1-acyl-sn-glycero-3-phospho(1)-D-myo-inositol | --- | g1ac3pi |
| LPI | 2-acyl-sn-glycero-3-phospho(1)-D-myo-inositol | --- | g2ac3pi |
| PIP | 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3-phosphate | pail3p | g1ac2ac3pi3p |
| PIP | 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-4-phosphate | pail4p | g1ac2ac3pi4p |
| PIP | 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-5-phosphate | pail5p | g1ac2ac3pi5p |
| PIP2 | 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3,4-bisphosphate | pail34p | g1ac2ac3pi3p4p |
| PIP2 | 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3,5-bisphosphate | pail35p | g1ac2ac3pi3p5p |
| PIP2 | 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-4,5-bisphosphate | pail45p | g1ac2ac3pi4p5p |
| PIP3 | 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3,4,5-trisphosphate | pail345p | g1ac2ac3pi3p4p4p |
| PGP | 1,2-diacyl-sn-glycero-3-phospho-(1ʼ-sn-glycero-3ʼ-phosphate) | pgp | g1ac2ac3pg3p |
| PG | 1,2-diacyl-sn-glycero-3-phospho-(1'-sn-glycerol) | pg | g1ac2ac3pg |
| PG | 1,2-diacyl-sn-glycero-3-phospho-(1'-sn-glycerol) | pg_BAC | g1ac2ac3pg |
With multiple phosphorylated PI headgroups (PIP, PIP2 and PIP3) the back part of the ID gets a bit complicated, but I guess it is still fine. Alternative is to separate this with a _, e.g. g1ac2ac3pi_3p4p.
CDP-DGs are still open. They would be g1ac2ac3cdp
This looks great. Some comments below, not sure we have to solve all of this.
- Perhaps we could add an example for a phophatidylcholin?
- How to write cardiolipin? or use other abbreviation for this?
- Should we do the sphingolipids analog? Can we add some examples?
- How to encode the various variants of
acside chains? It would be great to have some convention for this.
The following often occurs: ac=ac16 (palmitate by default, what means ac exactly?) ac18 (stearate) ac20, ac22, ac24 What about unsaturated variants (needs position of double bound and cis/trans)?
This is now for the moment for only the generic versions. I already though about ways how to encode specific acyl, alkyl or alkenyl chains. The position and stereochemistry of the double bond should be encoded. I developed something that is suitable for fatty acids, acyl-CoAs etc (everything that has a single acyl chain). I wrote my thoughts down in a manuscript type of document, I think I will put it on a preprint server soonish. @matthiaskoenig if you want I can send the current rough and preliminary version via eMail to check.
I will think about the cardiolipins and sphingolipids, might a bit tricky.
@michaelwitting Yes, please send the preprint. I will give you feedback on it (konigmatt[AT]googlemail.com).
Hi all. I would like to revive the discussion here. I was thinking about how to encode the side chains and the sphingoid bases etc. The main question is which level of detail is required. It would be good to have enough details to be able to reconstruct the chemical structure. In lipidomics shorthand notations like PC(16:0/16:1(9Z)) are used. Maybe this can be adapted?