bigg_models icon indicating copy to clipboard operation
bigg_models copied to clipboard

Call for feedback: New lipid naming convention

Open zakandrewking opened this issue 6 years ago • 17 comments

There is a nice proposal from @michaelwitting at WormJam for a systematic naming convention for lipids. It's described here:

https://github.com/JakeHattwell/wormjam/issues/11

This convention would generate nice-looking BiGG IDs. A couple examples:

Phosphatidylcholines:
1,2-diacylglycerophosphocholine --> 1ac2acg3pc (old pchol)
1-acylglycerophosphocholine --> 1acg3pc
2-acylglycerophosphocholine --> 2acg3pc

1-alkyl-2-acylglycerophosphocholine --> 1alk2acg3pc
1-alkylglycerophosphocholine --> 1alkg3pc

1-alkenyl-2-acylglycerophosphocholine --> 1alken2acg3pc
1-alkenylglycerophosphocholine --> 1alkeng3pc

If we adopt this, we will probably do it through the standard BiGG process: we will not remove old IDs (so pchol will stay), and new IDs will come in as we add models to BiGG. However, we can provide an extra level of checking to help users adopt these new IDs.

Tagging some BiGG watchers who might have feedback on this: @nel3 @neemajamshidi @matthiaskoenig @draeger @rmtfleming @phantomas1234 @willigott @cdanielmachado @smoretti @djinnome @cnorsig @jmcconn @jtyurkovich

zakandrewking avatar Sep 06 '19 14:09 zakandrewking

Sounds like a good plan to me.

cdanielmachado avatar Sep 06 '19 14:09 cdanielmachado

It seems reasonable. A question remains: Will old and new IDs coexist in the future? Would BiGG list the old IDs then as legacy identifiers and store them in a separate table?

draeger avatar Sep 06 '19 19:09 draeger

@draeger In this case, we will not replace any old IDs. Just add new ones.

zakandrewking avatar Sep 08 '19 19:09 zakandrewking

I don't like that this are not valid SBML identifiers, also it should be made clear that these are all triglycerols, i.e. using a glycerol backbone with possible three connections. There are other backbones which allow a wide range of connections, by being too specific here these other things cannot be encoded in a uniform manner)

I would highly recommend to prefix these to make the ids more clear, valid (SBML) and usable:

1,2-diacylglycerophosphocholine --> tg1ac2acg3pc (old pchol)
1-acylglycerophosphocholine --> tg1acg3pc
2-acylglycerophosphocholine --> tg2acg3pc

1-alkyl-2-acylglycerophosphocholine --> tg1alk2acg3pc
1-alkylglycerophosphocholine --> tg1alkg3pc

1-alkenyl-2-acylglycerophosphocholine --> tg1alken2acg3pc
1-alkenylglycerophosphocholine --> tg1alkeng3pc

matthiaskoenig avatar Sep 09 '19 07:09 matthiaskoenig

I think added the triglycerol here causes confusion with triacylglycerols. The name already contains g3pc, which is the ID for glycero-3-phosphocholine, the 1ac and 2ac state that it has an additional acyl group at positions 1 and 2.

michaelwitting avatar Sep 09 '19 07:09 michaelwitting

It will make it much easier to work with all triacylglycerols, if these have a common prefix because you can just filter the subset based on the prefix and listing the remaining 3 chains, i.e., than I even understand the rules for creating all possibilities:

    1. start with tg prefix
    1. list the up to 3 chains starting with the respective number of the connection, 1 is hereby the ... C atom (clarify this for chirality, i.e. which one is the 1); if there is no connection at one of the 3C leave it out. Order from 1 to 3.
1,2-diacylglycerophosphocholine --> tg1ac2ac3pc (old pchol)
1-acylglycerophosphocholine --> tg1ac3pc
2-acylglycerophosphocholine --> tg2ac3pc

1-alkyl-2-acylglycerophosphocholine --> tg1alk2ac3pc
1-alkylglycerophosphocholine --> tg1alk3pc

1-alkenyl-2-acylglycerophosphocholine --> tg1alken2ac3pc
1-alkenylglycerophosphocholine --> tg1alken3pc

matthiaskoenig avatar Sep 09 '19 08:09 matthiaskoenig

By the way also super easy to parse and no dependency if the pc is on 1 or 3.

matthiaskoenig avatar Sep 09 '19 08:09 matthiaskoenig

I see your points, but not all of them are tri-acyl-glycerols. Based on lipid biochemistry the glycerol-backbone is fixed to be sn-glycero-3-phosphate (coming from the synthesis). We could do it the other way round, having the lipid class in front and then the chain configuration, e.g. pc1ac2ac. Ideally the nomenclature should be consistent (at least in part) with the nomenclature used in the lipidomics field.

michaelwitting avatar Sep 09 '19 08:09 michaelwitting

Only listing the chains without a prefix will create problems if there is only one modification, which for instance would than be pc1 or 1pc which is probably already used as id for other things. By having a clear prefix the namespace becomes unique. Also there are other backbones which would require a prefix, e.g. the sphingolipids, which then have to be something like sp2ac3pc to disinguish from 2ac3pc, so why not name it something like g2ac3pc or tg2ac3pc, then it is clear from the id what the backbone is.

matthiaskoenig avatar Sep 09 '19 08:09 matthiaskoenig

Then I would go for version with g instead of tg, which avoids confusion with real triacylglycerols. Sphingolipids will become a bit more tricky in that regard, because there are several backbones possible. Rules for encoding the backbone would be need. For example C. elegans uses C17iso sphingoid bases, which are not found in mammals. I will think about different ways for the phospholipids.

michaelwitting avatar Sep 09 '19 08:09 michaelwitting

Made up my mind. We should go for the prefix. Works also well with some IDs that are already in BiGG. A example for glycero- and glycerophospholipids: g3pc and g3pe are already in BiGG for sn-glycero-phosphocholine and sn-glycero-phosphoethanolamine. g1ac3pc would then represent a 1-acyl-sn-glycero-phosphocholine etc. I will prepare a table with the "old" IDs and the new systematic ones. I would anyway need this for our WormJam model.

michaelwitting avatar Sep 12 '19 12:09 michaelwitting

Here is a table with examples from the WormJam model.

Class Metabolite Old / Wrong / Duplicated ID (WormJam) Correct / New ID
MG 1-acyl-sn-glycerol 1magol g1ac
MG 2-acyl-sn-glycerol mag g2ac
MG-O 1-alkyl-sn-glycerol --- g1alk
MG-P 1-(Z)-alk-1-enyl-sn-glycerol alkenglyc g1alken
DG 1,2-diacyl-sn-glycerol 12dag g1ac2ac
DG-O 1-alkyl-2-acyl-sn-glycerol akac2g g1alk2ac
DG-P 1-(Z)-alk-1-enyl-2-acyl-glycerol alkenac2g g1alken2ac
TG Triacyl-glycerol tag g1ac2ac3ac
TG-O 1-alkyl-2,3-diacylglycerol --- g1alk2ac3ac
TG-P 1-(Z)-alk-1-enyl-2,3-diacylglycerol --- g1alken2ac3ac
DHAP 1-acylglycerone 3-phosphate Adhap dhap1ac
DHAP-O 1-alkylglycerone 3-phosphate akdhap dhap1alk
PA 1,2-diacyl-sn-glycero-3-phosphate pa_pl g1ac2ac3p
PA 1,2-diacyl-sn-glycero-3-phosphate 12dag3p g1ac2ac3p
LPA 1-acyl-sn-glycero-3-phosphate alpa g1ac3p
LPA 1-acyl-sn-glycero-3-phosphate alpa_tag g1ac3p
LPA 1-acyl-sn-glycero-3-phosphate 1ag3p_SC g1ac3p
LPA 2-acyl-sn-glycero-3-phosphate --- g2ac3p
LPA-O 1-alkyl-sn-glycero-3-phosphate alkgp g1alk3p
PA-O 1-alkyl-2-acyl-sn-glycero-3-phosphate akac2gp g1alk2ac3p
LPA-P 1-(Z)-alk-1-enyl-sn-glycero-3-phosphate --- g1alken3p
PA-P 1-(Z)-alk-1-enyl-2-acyl-sn-glycero-3-phosphate --- g1alken2ac3p
PC 1,2-diacyl-sn-glycero-3-phosphocholine pchol g1ac2ac3pc
LPC 1-acyl-sn-glycero-3-phosphocholine ag3pc g1ac3pc
LPC 2-acyl-sn-glycero-3-phosphocholine 2agpc g2ac3pc
PC-O 1-alkyl-2-acyl-sn-glycero-3-phosphocholine akac2gchol g1alk2ac3pc
LPC-O 1-alkyl-sn-glycero-3-phosphocholine ak2lgchol g1alk3pc
PC-P 1-(Z)-alk-1-enyl-2-acyl-sn-glycero-3-phosphocholine --- g1alken2ac3pc
LPC-P 1-(Z)-alk-1-enyl-sn-glycero-3-phosphocholine --- g1alken3pc
PE 1,2-diacyl-sn-glycero-3-phosphoethanolamine pe g1ac2ac3pe
PE 1,2-diacyl-sn-glycero-3-phosphoethanolamine pe_BAC g1ac2ac3pe
LPE 1-acyl-sn-glycero-3-phosphoethanolamine acg3pe g1ac3pe
LPE 2-acyl-sn-glycero-3-phosphoethanolamine --- g2ac3pe
PE-O 1-alkyl-2-acyl-sn-glycero-3-phosphoethanolamine akac2gpe g1alk2ac3pe
LPE-O 1-alkyl-sn-glycero-3-phosphoethanolamine --- g1alk3pe
PE-P 1-(Z)-alk-1-enyl-2-acyl-sn-glycero-3-phosphoethanolamine alkenac2gpe g1alken2ac3pe
LPE-P 1-(Z)-alk-1-enyl-sn-glycero-3-phosphoethanolamine alken2gpe g1alken3pe
PS 1,2-diacyl-sn-glycero-3-phospho-L-serine ps g1ac2ac3ps
LPS 1-acyl-sn-glycero-3-phospho-L-serine acg3ps g1ac3ps
LPS 2-acyl-sn-glycero-3-phospho-L-serine --- g2ac3ps
PI 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol pail g1ac2ac3pi
LPI 1-acyl-sn-glycero-3-phospho(1)-D-myo-inositol --- g1ac3pi
LPI 2-acyl-sn-glycero-3-phospho(1)-D-myo-inositol --- g2ac3pi
PIP 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3-phosphate pail3p g1ac2ac3pi3p
PIP 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-4-phosphate pail4p g1ac2ac3pi4p
PIP 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-5-phosphate pail5p g1ac2ac3pi5p
PIP2 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3,4-bisphosphate pail34p g1ac2ac3pi3p4p
PIP2 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3,5-bisphosphate pail35p g1ac2ac3pi3p5p
PIP2 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-4,5-bisphosphate pail45p g1ac2ac3pi4p5p
PIP3 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3,4,5-trisphosphate pail345p g1ac2ac3pi3p4p4p
PGP 1,2-diacyl-sn-glycero-3-phospho-(1ʼ-sn-glycero-3ʼ-phosphate) pgp g1ac2ac3pg3p
PG 1,2-diacyl-sn-glycero-3-phospho-(1'-sn-glycerol) pg g1ac2ac3pg
PG 1,2-diacyl-sn-glycero-3-phospho-(1'-sn-glycerol) pg_BAC g1ac2ac3pg

With multiple phosphorylated PI headgroups (PIP, PIP2 and PIP3) the back part of the ID gets a bit complicated, but I guess it is still fine. Alternative is to separate this with a _, e.g. g1ac2ac3pi_3p4p. CDP-DGs are still open. They would be g1ac2ac3cdp

michaelwitting avatar Sep 12 '19 12:09 michaelwitting

This looks great. Some comments below, not sure we have to solve all of this.

  • Perhaps we could add an example for a phophatidylcholin?
  • How to write cardiolipin? or use other abbreviation for this?
  • Should we do the sphingolipids analog? Can we add some examples?
  • How to encode the various variants of ac side chains? It would be great to have some convention for this.

The following often occurs: ac=ac16 (palmitate by default, what means ac exactly?) ac18 (stearate) ac20, ac22, ac24 What about unsaturated variants (needs position of double bound and cis/trans)?

matthiaskoenig avatar Sep 12 '19 17:09 matthiaskoenig

This is now for the moment for only the generic versions. I already though about ways how to encode specific acyl, alkyl or alkenyl chains. The position and stereochemistry of the double bond should be encoded. I developed something that is suitable for fatty acids, acyl-CoAs etc (everything that has a single acyl chain). I wrote my thoughts down in a manuscript type of document, I think I will put it on a preprint server soonish. @matthiaskoenig if you want I can send the current rough and preliminary version via eMail to check.

michaelwitting avatar Sep 12 '19 18:09 michaelwitting

I will think about the cardiolipins and sphingolipids, might a bit tricky.

michaelwitting avatar Sep 12 '19 18:09 michaelwitting

@michaelwitting Yes, please send the preprint. I will give you feedback on it (konigmatt[AT]googlemail.com).

matthiaskoenig avatar Sep 13 '19 06:09 matthiaskoenig

Hi all. I would like to revive the discussion here. I was thinking about how to encode the side chains and the sphingoid bases etc. The main question is which level of detail is required. It would be good to have enough details to be able to reconstruct the chemical structure. In lipidomics shorthand notations like PC(16:0/16:1(9Z)) are used. Maybe this can be adapted?

michaelwitting avatar Dec 10 '19 19:12 michaelwitting