yeast-GEM icon indicating copy to clipboard operation
yeast-GEM copied to clipboard

fix: subSystems field

Open BenjaSanchez opened this issue 8 years ago • 35 comments
trafficstars

Currently the fields rxnECNumbers & subSystems are retrieved from KEGG & Swissprot, using an automatic script. However, this leads to many cases in which there is more than one match, undesired in the case of subsystems. At some previous version of the Yeast model, this information was indeed present, but someone deleted those fields. @hongzhonglu could you look further into this? The desired data should be available in the original sourceForge repository

BenjaSanchez avatar Oct 25 '17 11:10 BenjaSanchez

@demilappa maybe you can shed some light here? I remember you told me you managed to do this once, if so which version of Yeast did you use for it?

BenjaSanchez avatar Oct 25 '17 13:10 BenjaSanchez

@BenjaSanchez I have added the rxnECNumbers & subSystems to other GEMs of bacterial species with semi-automated scripts, not the yeast one. Unfortunately, for multiple EC Numbers per reaction it involved manual input in case the Subsytems were different.

In my case the source of reconstruction was http://kbase.us/metabolic-modeling-in-kbase/. In their Git repository they already had a preliminary mapping among rxns and external DBs, where for the vast majority of the cases they had soecified compartments. As for the EC Numbers, you can still have multiple ones in your structure.

demilappa avatar Oct 25 '17 13:10 demilappa

@demilappa thanks for the info! @hongzhonglu as you mentioned that you checked previous versions of the model with no luck, I will close this issue.

BenjaSanchez avatar Oct 25 '17 14:10 BenjaSanchez

@BenjaSanchez, about two weeks ago, I did some mapping according to the reaction name to obtain the subsystem for each reaction. The model used in the mapping are iMM904, iTO980 and the latest human model Recon3D. All the transport reaction are clarified as transport+compartment. So based on such mapping and the present subsystem in the yeast7.6, I hope each reaction can has a unique subsystem, just like the latest human and E.coli GEMs.

hongzhonglu avatar Oct 25 '17 17:10 hongzhonglu

@hongzhonglu interesting! how much coverage did you achieve with said mapping? which pathway names are you getting back? from KEGG or another database? I have now re-opened this issue with a changed goal: to eliminate the duplicates in subsystems. Looking forward to your contribution!

BenjaSanchez avatar Oct 26 '17 09:10 BenjaSanchez

@BenjaSanchez I have updated the subsystem. You can check it now!

hongzhonglu avatar Mar 11 '18 14:03 hongzhonglu

as discussed in person, a new branch curation/reactions was created for adding these changes

BenjaSanchez avatar Mar 14 '18 13:03 BenjaSanchez

Discussed today:

  • Reactions are allowed to have multiple subsystems
  • Subsystems should primarily be KEGG pathways
  • Subsystems might also be something like "Transport ER to Golgi", "Exchange" and "Pseudoreaction"
  • All reactions (with perhaps some exceptions), should have at least one subsystem
  • If reactions cannot be annotated to any existing (KEGG) pathway, a new subsystem may be defined, but should not be too specific and only include a few genes (rather "Alternative carbon source" than "Arabinose catabolism")
  • Appendices can contain additional information, such as hierarchy of subsystems.

edkerk avatar Jun 21 '18 14:06 edkerk

I would like to reignite this issue.

In yeast-GEM, reactions can be annotated to multiple subsystems, and these are predominantly KEGG pathways. Is this desired? This was discussed offline (reported in https://github.com/SysBioChalmers/yeast-GEM/issues/11#issuecomment-399115270), but arguments for this were not recorded.

From the comments above, it seemed there was consensus that single subsystems are preferred, but we ended up with multiple subystems anyway.

A few points to consider:

  • Single subsystems makes it easier to uniquely group reactions together, This can be useful in various scenarios, e.g.
    • If you want to present data/results from the model and you want each reaction present in only one subsystem.
    • If you want to uniquely associate reactions to specific maps (e.g. in MetabolicAtlas).
  • We can easily define our own subSystems if we're not sticking to KEGG pathways. The current subSystems based on KEGG pathways might not always be the most suitable. This also prevents the odd mixture of subSystems starting with a KEGG Pathway ID and those without, while still retaining the KEGG Pathway IDs (see next bulletpoint).
  • Single subSystems does not mean completely discarding the current KEGG pathway annotations. KEGG pathways are on identifiers.org and can therefore be handled as reaction identifiers (we can propose rxnKEGGpathway to be included in COBRA Toolbox?).

Having single subSystems and rxnKEGGpathway annotations is the best of both worlds.

Expected feature/value/output:

Single subSystems, e.g. in the case of r_0103, acetyl-CoA C-acetyltransferase; Fatty acid degradation

Current feature/value/output:

Many reactions have multiple subSystems, e.g. in the case of r_0103, acetyl-CoA C-acetyltransferase; sce00071  Fatty acid degradation;sce00072  Synthesis and degradation of ketone bodies;sce00280  Valine, leucine and isoleucine degradation;sce00310  Lysine degradation;sce00380  Tryptophan metabolism;sce00620  Pyruvate metabolism;sce00630  Glyoxylate and dicarboxylate metabolism;sce00640  Propanoate metabolism;sce00650  Butanoate metabolism;sce00900  Terpenoid backbone biosynthesis;sce01110  Biosynthesis of secondary metabolites;sce01130  Biosynthesis of antibiotics;sce01200  Carbon metabolism;sce01212  Fatty acid metabolism

edkerk avatar Jun 22 '20 22:06 edkerk

Hi Ed, @edkerk actually, we now already has single subsystem for each rxns. I like your idea to have "both single subSystems and rxnKEGGpathway annotations".

hongzhonglu avatar Jun 23 '20 05:06 hongzhonglu

@hongzhonglu, that's great, but where can I find those single subsystems? Both in master and devel reactions still have multiple subsystems.

edkerk avatar Jun 23 '20 06:06 edkerk

Hi Ed, @edkerk, i don't yet upload them as we did not know how to handle it before.

hongzhonglu avatar Jun 23 '20 08:06 hongzhonglu

@edkerk completely agree with moving the current subSystems to a rxnKEGGpathway field, that way we don't loose any information while having better subsystem definition :)

BenjaSanchez avatar Jun 23 '20 08:06 BenjaSanchez

Is there any timeline for this change, at least with regards to having this up on a feature branch?

mihai-sysbio avatar Jul 13 '20 10:07 mihai-sysbio

Hello, @mihai-sysbio I will try to update it this month.

hongzhonglu avatar Jul 13 '20 17:07 hongzhonglu

Hi, @edkerk @BenjaSanchez @feiranl Now I add the manually curated subsystem to the data file of our repo. https://github.com/SysBioChalmers/yeast-GEM/blob/unique_subsystem_of_rxn/ComplementaryData/modelCuration/Rxn_unique_subsystem.tsv.

I think we still need to refine it before we merge it into the model in the followed issues:

  1. Now 99 reactions belong to types of “none” or “other”. They can be put under “Unassigned” like the latest E.coli GEMs.

  2. Another issue is the “slime reaction” in the lipid metablism. Seems strange in the definition as it is not recorded from literature.

hongzhonglu avatar Jul 23 '20 15:07 hongzhonglu

In order to merge to devel we should first propose to cobratoolbox the addition of rxnKEGGpathway as a new field, @edkerk is it enough to add a new line to src/base/io/definitions/COBRA_structure_fields.csv? I can take care of it, maybe also adding some of the fields discussed in https://github.com/SysBioChalmers/RAVEN/pull/285#issuecomment-624828770

@hongzhonglu answering to your other questions:

  1. Why create a name at all? We could instead leave an empty string for all those cases (AFAIK not all reactions need to be a part of a subsystem).

  2. "SLIME reaction" is added by the SLIMEr formalism, and its definition is available at https://doi.org/10.1186/s12918-018-0673-8.

BenjaSanchez avatar Jul 24 '20 15:07 BenjaSanchez

@BenjaSanchez It is indeed sufficient to just modify the COBRA_structure_fields.csv file, as done here: https://github.com/opencobra/cobratoolbox/pull/1591.

edkerk avatar Jul 28 '20 21:07 edkerk

Here are some very specific questions and observations - please excuse my ignorance and silly questions:

  • the tsv is inconsistent in its use of quotations "
  • is it okay to have 2 KEGG pathways, ie glycerolipid metabolism / glycerophospholipid metabolism ( sce00561,sce00564 ) for some reactions?
  • should exchange reaction be exchange reactions, in its plural form?
  • is growth a good subsystem name?
  • does the transport subsystem really need to be divided ? (eg transport [cytoplasm, golgi_membrane], transport [cytoplasm, golgi] and so on)

mihai-sysbio avatar Jul 29 '20 08:07 mihai-sysbio

@mihai-sysbio

Here are some very specific questions and observations - please excuse my ignorance and silly questions:

  • the tsv is inconsistent in its use of quotations "

Seems like strings with a comma are quoted. Not sure what was used to write this file.

  • is it okay to have 2 KEGG pathways, ie glycerolipid metabolism / glycerophospholipid metabolism ( sce00561,sce00564 ) for some reactions?

Ideally one.

  • should exchange reaction be exchange reactions, in its plural form?

Either way fine by me

  • is growth a good subsystem name?

"Biomass is perhaps better?"

  • does the transport subsystem really need to be divided ? (eg transport [cytoplasm, golgi_membrane], transport [cytoplasm, golgi] and so on)

Either way fine by me

edkerk avatar Aug 24 '20 19:08 edkerk

I will update it based on all your comments.

hongzhonglu avatar Sep 11 '20 11:09 hongzhonglu

Update: I have opened https://github.com/SysBioChalmers/yeast-GEM/pull/253 moving the KEGG pathway ids to the proper field. After it is merged, model.subSystems will be ready to get populated with the new group classifications :)

BenjaSanchez avatar Nov 24 '20 12:11 BenjaSanchez

Nice! I will try to further update the group classifications.

hongzhonglu avatar Nov 24 '20 12:11 hongzhonglu

What is the status of this? I was about to make a few curations to Hongzhong's proposed list of subsytems, but then I noticed that MetabolicAtlas also has subsystems defined . @mihai-sysbio should we try to adhere to the MetabolicAtlas maps whenever possible/suitable? Or will these just be re-drawn when we here have settled the subsystems in yeast-GEM?

edkerk avatar Jun 24 '21 08:06 edkerk

I now noticed the maps on https://github.com/SysBioChalmers/Yeast-maps/, should we try to adhere to these? How have they been defined?

edkerk avatar Jun 29 '21 13:06 edkerk

What is the status of this?

This is a bit of a thorny issue. I don't think I can remember all the details accurately (please correct me @edkerk @pecholleyc @hongzhonglu):

  • in the last release yeast-GEM 8.4.2, a reaction could be mapped to multiple subsystems
  • it was desired to switch to one reaction mapped to a single subsystem
  • it looks like the maps have been created based on the subsystems in the yml file in master, ie 8.4.2
  • @hongzhonglu's definition of subsystems has not been merged in devel
  • the subsystem field is missing in the yml file in devel
  • the old subsystems are under the annotation section as kegg.pathway
  • the svg files are mapped to subsystem IDs in this file
  • even if the mapping is not perfect, ie. not all reactions of a subsystem are on the map, and/or not all reactions on the map are part of the subsystem, that's something we can live with I think
  • the Yeast-maps repository is mentioned in the Yeast8 paper
  • the Yeast-maps repository does not contain all the maps, eg not the SVG maps shown on Metabolic Atlas
  • Metabolic Atlas has added support for custom maps, eg for maps that combine parts of different subsystems
  • Metabolic Atlas has added support for a single reaction to be mapped to multiple subsystems, here is an example
  • the maps are time-consuming to redraw - at the moment there is no plan to redraw any maps

@mihai-sysbio should we try to adhere to the MetabolicAtlas maps whenever possible/suitable?

With all of the above in mind my suggestion going forward is to add 1 subsystem for each reaction in the yml file. The next step would then be to decide which SVG is best suited as a map for the respective subsystem.

mihai-sysbio avatar Jun 30 '21 14:06 mihai-sysbio

I should have mentioned this, but Metabolic Atlas is relying on the subsystem field in the yml populate the reaction-subsystem association. We will likely have to postpone updating the model until this gets sorted out.

mihai-sysbio avatar Jul 01 '21 14:07 mihai-sysbio

I should have mentioned this, but Metabolic Atlas is relying on the subsystem field in the yml populate the reaction-subsystem association. We will likely have to postpone updating the model until this gets sorted out.

But it makes sense that this information is sourced from subsystem in the yml? And this doesn't affect the maps, right, as they are already defined? I'm not sure what changes there should be made for "this gets sorted out", and how this would affect the subsystem definition?

edkerk avatar Jul 01 '21 15:07 edkerk

Hi all, I am sorry that in past months, i have no time to further curate subsystem used for our model and maps. But I will try to update it in following months. The maps are drawn based on the subsystems, so if the subsystems were updated, the map will be need to be updated also. @edkerk

hongzhonglu avatar Jul 04 '21 01:07 hongzhonglu

@edkerk I'm following up on your questions below.

But it makes sense that this information is sourced from subsystem in the yml?

I guess we can also change how the reaction to subsytem association is parsed from the yml file, but that's how things work like for the rest of the models on Metabolic Atlas.

And this doesn't affect the maps, right, as they are already defined?

Indeed - the SVG maps are already created and they won't be changed. They would need to be updated, as @hongzhonglu says above, but it will take a long time until that happens, so for the near future we should think that they won't be changed.

I'm not sure what changes there should be made for "this gets sorted out", and how this would affect the subsystem definition?

What is expected is that in the yml file for each reaction to have 1 subsystem, for example:

- reactions:
    - !!omap
      - id: "MAR03905"
      ...
      - subsystem:
          - "Glycolysis / Gluconeogenesis"

To my knowledge, this is part of the normal export to yaml that has been recently included in Raven.

mihai-sysbio avatar Jul 04 '21 14:07 mihai-sysbio