yeast-GEM icon indicating copy to clipboard operation
yeast-GEM copied to clipboard

Correction of compartment reaction——lipid droplet

Open cheng-yu-zhang opened this issue 3 years ago • 2 comments

Main improvements in this PR:

Try to be as clear as possible: Is it fixing/adding something in the model? Is it an additional test/function/dataset? PLEASE DELETE THIS LINE.

Correct reactions in lipid droplet baesd on protein location, which is detailed in the file "protein_location_sce.tsv". The file "bound0.xlsx" shows all the reactions whose bounds are restricted to 0.

Explanation: Check all reactions from the main branch. And take the reaction "r_0001" for example. "r_loca" refers to the location of "r_0001" accoording to the model. "r_pro" refers to the corresponding protein in the model. "all_lipid" refers to all the lipid droplet protein baesd on the file "protein_location_sce.tsv".

  • If "r_loca" is lipid droplet and "r_pro" is not in "all_lipid", the upper and lower bound of "r_0001" are restricted to 0.
  • If "r_loca" is not lipid droplet and "r_pro" is in "all_lipid", the "r_0001" is added into lipid droplet compartment.

I hereby confirm that I have:

  • [ ] Tested my code with all requirements for running the model
  • [ ] Selected develop as a target branch (top left drop-down menu)
  • [ ] If needed, asked first in the Gitter chat room about this PR

cheng-yu-zhang avatar May 26 '22 13:05 cheng-yu-zhang

Please see: https://github.com/SysBioChalmers/yeast-GEM/issues/303#issuecomment-1140223052, first priority is to resolve #305 and #306.

edkerk avatar May 28 '22 09:05 edkerk

  • It is probably not necessary to have a 60+ MB protein_location_sce.tsv file with annotation data. Please simplify this table.
  • The function of bound0.xlsx is unclear: what is meant with "bounds restricted to 0", and how is this used to curate the model?
  • Tables should be in flat-text (TSV or similar) format, not Excel (bound0.xlsx).
  • It is unclear where the localization data is coming from. are there any cut-offs used? Are reactions copied or moved? Is there manual curation?

The same comments can be made for #315, but probably best to fix this for one organelle first.

edkerk avatar Sep 05 '22 18:09 edkerk

Revisiting I think I now understand partially what this aimed to do:

  • protein_location_sce.tsv is a list with GO term annotation of unknown source
  • If a gene is annotated in that file to be located in the lipid droplet, then the associated reaction is copied into the lipid droplet compartment.
  • I don't understand bound0.xlsx, are these supposedly reactions that are located in the lipid droplet but they should not be? In that case, it is probably the wrong file, because the reactions are located in the peroxisome instead.

Regardless, curation of compartments is still useful, but should be handled differently:

  • Have a trustworthy source of compartment information. Current source is unknown.
  • There are papers with lipid droplet proteomics, this would be valuable input data to use.
  • Regardless, just assigning proteins to different compartments based only on high-throughput data is not sensible. Look at the whole functional pathways: are whole pathways located in the lipid droplet or not? Maybe some additional reactions should be copied, to get functional pathways.

I will close this PR because it does not follow a sensible strategy to do this curation. Please feel free to open an Issue to discuss how to implement this curation, based on what data and which reactions should be changed.

edkerk avatar Jul 02 '23 15:07 edkerk