yeast-GEM
yeast-GEM copied to clipboard
fix: subSystems field
Currently the fields rxnECNumbers & subSystems are retrieved from KEGG & Swissprot, using an automatic script. However, this leads to many cases in which there is more than one match, undesired in the case of subsystems. At some previous version of the Yeast model, this information was indeed present, but someone deleted those fields. @hongzhonglu could you look further into this? The desired data should be available in the original sourceForge repository
@demilappa maybe you can shed some light here? I remember you told me you managed to do this once, if so which version of Yeast did you use for it?
@BenjaSanchez I have added the rxnECNumbers & subSystems to other GEMs of bacterial species with semi-automated scripts, not the yeast one. Unfortunately, for multiple EC Numbers per reaction it involved manual input in case the Subsytems were different.
In my case the source of reconstruction was http://kbase.us/metabolic-modeling-in-kbase/. In their Git repository they already had a preliminary mapping among rxns and external DBs, where for the vast majority of the cases they had soecified compartments. As for the EC Numbers, you can still have multiple ones in your structure.
@demilappa thanks for the info! @hongzhonglu as you mentioned that you checked previous versions of the model with no luck, I will close this issue.
@BenjaSanchez, about two weeks ago, I did some mapping according to the reaction name to obtain the subsystem for each reaction. The model used in the mapping are iMM904, iTO980 and the latest human model Recon3D. All the transport reaction are clarified as transport+compartment. So based on such mapping and the present subsystem in the yeast7.6, I hope each reaction can has a unique subsystem, just like the latest human and E.coli GEMs.
@hongzhonglu interesting! how much coverage did you achieve with said mapping? which pathway names are you getting back? from KEGG or another database? I have now re-opened this issue with a changed goal: to eliminate the duplicates in subsystems. Looking forward to your contribution!
@BenjaSanchez I have updated the subsystem. You can check it now!
as discussed in person, a new branch curation/reactions was created for adding these changes
Discussed today:
- Reactions are allowed to have multiple subsystems
- Subsystems should primarily be KEGG pathways
- Subsystems might also be something like "Transport ER to Golgi", "Exchange" and "Pseudoreaction"
- All reactions (with perhaps some exceptions), should have at least one subsystem
- If reactions cannot be annotated to any existing (KEGG) pathway, a new subsystem may be defined, but should not be too specific and only include a few genes (rather "Alternative carbon source" than "Arabinose catabolism")
- Appendices can contain additional information, such as hierarchy of subsystems.
I would like to reignite this issue.
In yeast-GEM, reactions can be annotated to multiple subsystems, and these are predominantly KEGG pathways. Is this desired? This was discussed offline (reported in https://github.com/SysBioChalmers/yeast-GEM/issues/11#issuecomment-399115270), but arguments for this were not recorded.
From the comments above, it seemed there was consensus that single subsystems are preferred, but we ended up with multiple subystems anyway.
A few points to consider:
- Single subsystems makes it easier to uniquely group reactions together, This can be useful in various scenarios, e.g.
- If you want to present data/results from the model and you want each reaction present in only one subsystem.
- If you want to uniquely associate reactions to specific maps (e.g. in MetabolicAtlas).
- We can easily define our own subSystems if we're not sticking to KEGG pathways. The current subSystems based on KEGG pathways might not always be the most suitable. This also prevents the odd mixture of subSystems starting with a KEGG Pathway ID and those without, while still retaining the KEGG Pathway IDs (see next bulletpoint).
- Single subSystems does not mean completely discarding the current KEGG pathway annotations. KEGG pathways are on identifiers.org and can therefore be handled as reaction identifiers (we can propose
rxnKEGGpathwayto be included in COBRA Toolbox?).
Having single subSystems and rxnKEGGpathway annotations is the best of both worlds.
Expected feature/value/output:
Single subSystems, e.g. in the case of r_0103, acetyl-CoA C-acetyltransferase; Fatty acid degradation
Current feature/value/output:
Many reactions have multiple subSystems, e.g. in the case of r_0103, acetyl-CoA C-acetyltransferase; sce00071 Fatty acid degradation;sce00072 Synthesis and degradation of ketone bodies;sce00280 Valine, leucine and isoleucine degradation;sce00310 Lysine degradation;sce00380 Tryptophan metabolism;sce00620 Pyruvate metabolism;sce00630 Glyoxylate and dicarboxylate metabolism;sce00640 Propanoate metabolism;sce00650 Butanoate metabolism;sce00900 Terpenoid backbone biosynthesis;sce01110 Biosynthesis of secondary metabolites;sce01130 Biosynthesis of antibiotics;sce01200 Carbon metabolism;sce01212 Fatty acid metabolism
Hi Ed, @edkerk actually, we now already has single subsystem for each rxns. I like your idea to have "both single subSystems and rxnKEGGpathway annotations".
@hongzhonglu, that's great, but where can I find those single subsystems? Both in master and devel reactions still have multiple subsystems.
Hi Ed, @edkerk, i don't yet upload them as we did not know how to handle it before.
@edkerk completely agree with moving the current subSystems to a rxnKEGGpathway field, that way we don't loose any information while having better subsystem definition :)
Is there any timeline for this change, at least with regards to having this up on a feature branch?
Hello, @mihai-sysbio I will try to update it this month.
Hi, @edkerk @BenjaSanchez @feiranl Now I add the manually curated subsystem to the data file of our repo. https://github.com/SysBioChalmers/yeast-GEM/blob/unique_subsystem_of_rxn/ComplementaryData/modelCuration/Rxn_unique_subsystem.tsv.
I think we still need to refine it before we merge it into the model in the followed issues:
-
Now 99 reactions belong to types of “none” or “other”. They can be put under “Unassigned” like the latest E.coli GEMs.
-
Another issue is the “slime reaction” in the lipid metablism. Seems strange in the definition as it is not recorded from literature.
In order to merge to devel we should first propose to cobratoolbox the addition of rxnKEGGpathway as a new field, @edkerk is it enough to add a new line to src/base/io/definitions/COBRA_structure_fields.csv? I can take care of it, maybe also adding some of the fields discussed in https://github.com/SysBioChalmers/RAVEN/pull/285#issuecomment-624828770
@hongzhonglu answering to your other questions:
-
Why create a name at all? We could instead leave an empty string for all those cases (AFAIK not all reactions need to be a part of a subsystem).
-
"SLIME reaction" is added by the SLIMEr formalism, and its definition is available at https://doi.org/10.1186/s12918-018-0673-8.
@BenjaSanchez It is indeed sufficient to just modify the COBRA_structure_fields.csv file, as done here: https://github.com/opencobra/cobratoolbox/pull/1591.
Here are some very specific questions and observations - please excuse my ignorance and silly questions:
- the
tsvis inconsistent in its use of quotations" - is it okay to have 2 KEGG pathways, ie
glycerolipid metabolism / glycerophospholipid metabolism ( sce00561,sce00564 )for some reactions? - should
exchange reactionbeexchange reactions, in its plural form? - is
growtha good subsystem name? - does the
transportsubsystem really need to be divided ? (egtransport [cytoplasm, golgi_membrane],transport [cytoplasm, golgi]and so on)
@mihai-sysbio
Here are some very specific questions and observations - please excuse my ignorance and silly questions:
- the
tsvis inconsistent in its use of quotations"
Seems like strings with a comma are quoted. Not sure what was used to write this file.
- is it okay to have 2 KEGG pathways, ie
glycerolipid metabolism / glycerophospholipid metabolism ( sce00561,sce00564 )for some reactions?
Ideally one.
- should
exchange reactionbeexchange reactions, in its plural form?
Either way fine by me
- is
growtha good subsystem name?
"Biomass is perhaps better?"
- does the
transportsubsystem really need to be divided ? (egtransport [cytoplasm, golgi_membrane],transport [cytoplasm, golgi]and so on)
Either way fine by me
I will update it based on all your comments.
Update: I have opened https://github.com/SysBioChalmers/yeast-GEM/pull/253 moving the KEGG pathway ids to the proper field. After it is merged, model.subSystems will be ready to get populated with the new group classifications :)
Nice! I will try to further update the group classifications.
What is the status of this? I was about to make a few curations to Hongzhong's proposed list of subsytems, but then I noticed that MetabolicAtlas also has subsystems defined . @mihai-sysbio should we try to adhere to the MetabolicAtlas maps whenever possible/suitable? Or will these just be re-drawn when we here have settled the subsystems in yeast-GEM?
I now noticed the maps on https://github.com/SysBioChalmers/Yeast-maps/, should we try to adhere to these? How have they been defined?
What is the status of this?
This is a bit of a thorny issue. I don't think I can remember all the details accurately (please correct me @edkerk @pecholleyc @hongzhonglu):
- in the last release yeast-GEM
8.4.2, a reaction could be mapped to multiple subsystems - it was desired to switch to one reaction mapped to a single subsystem
- it looks like the maps have been created based on the subsystems in the
ymlfile inmaster, ie8.4.2 - @hongzhonglu's definition of subsystems has not been merged in
devel - the
subsystemfield is missing in theymlfile indevel - the old subsystems are under the
annotationsection askegg.pathway - the svg files are mapped to subsystem IDs in this file
- even if the mapping is not perfect, ie. not all reactions of a subsystem are on the map, and/or not all reactions on the map are part of the subsystem, that's something we can live with I think
- the
Yeast-mapsrepository is mentioned in the Yeast8 paper - the
Yeast-mapsrepository does not contain all the maps, eg not the SVG maps shown on Metabolic Atlas - Metabolic Atlas has added support for custom maps, eg for maps that combine parts of different subsystems
- Metabolic Atlas has added support for a single reaction to be mapped to multiple subsystems, here is an example
- the maps are time-consuming to redraw - at the moment there is no plan to redraw any maps
@mihai-sysbio should we try to adhere to the MetabolicAtlas maps whenever possible/suitable?
With all of the above in mind my suggestion going forward is to add 1 subsystem for each reaction in the yml file. The next step would then be to decide which SVG is best suited as a map for the respective subsystem.
I should have mentioned this, but Metabolic Atlas is relying on the subsystem field in the yml populate the reaction-subsystem association. We will likely have to postpone updating the model until this gets sorted out.
I should have mentioned this, but Metabolic Atlas is relying on the
subsystemfield in theymlpopulate the reaction-subsystem association. We will likely have to postpone updating the model until this gets sorted out.
But it makes sense that this information is sourced from subsystem in the yml? And this doesn't affect the maps, right, as they are already defined? I'm not sure what changes there should be made for "this gets sorted out", and how this would affect the subsystem definition?
Hi all, I am sorry that in past months, i have no time to further curate subsystem used for our model and maps. But I will try to update it in following months. The maps are drawn based on the subsystems, so if the subsystems were updated, the map will be need to be updated also. @edkerk
@edkerk I'm following up on your questions below.
But it makes sense that this information is sourced from subsystem in the yml?
I guess we can also change how the reaction to subsytem association is parsed from the yml file, but that's how things work like for the rest of the models on Metabolic Atlas.
And this doesn't affect the maps, right, as they are already defined?
Indeed - the SVG maps are already created and they won't be changed. They would need to be updated, as @hongzhonglu says above, but it will take a long time until that happens, so for the near future we should think that they won't be changed.
I'm not sure what changes there should be made for "this gets sorted out", and how this would affect the subsystem definition?
What is expected is that in the yml file for each reaction to have 1 subsystem, for example:
- reactions:
- !!omap
- id: "MAR03905"
...
- subsystem:
- "Glycolysis / Gluconeogenesis"
To my knowledge, this is part of the normal export to yaml that has been recently included in Raven.