yeast-GEM
yeast-GEM copied to clipboard
Correction of compartment reaction——lipid droplet
Main improvements in this PR:
Try to be as clear as possible: Is it fixing/adding something in the model? Is it an additional test/function/dataset? PLEASE DELETE THIS LINE.
Correct reactions in lipid droplet baesd on protein location, which is detailed in the file "protein_location_sce.tsv". The file "bound0.xlsx" shows all the reactions whose bounds are restricted to 0.
Explanation: Check all reactions from the main branch. And take the reaction "r_0001" for example. "r_loca" refers to the location of "r_0001" accoording to the model. "r_pro" refers to the corresponding protein in the model. "all_lipid" refers to all the lipid droplet protein baesd on the file "protein_location_sce.tsv".
- If "r_loca" is lipid droplet and "r_pro" is not in "all_lipid", the upper and lower bound of "r_0001" are restricted to 0.
- If "r_loca" is not lipid droplet and "r_pro" is in "all_lipid", the "r_0001" is added into lipid droplet compartment.
I hereby confirm that I have:
- [ ] Tested my code with all requirements for running the model
- [ ] Selected
developas a target branch (top left drop-down menu) - [ ] If needed, asked first in the Gitter chat room about this PR
Please see: https://github.com/SysBioChalmers/yeast-GEM/issues/303#issuecomment-1140223052, first priority is to resolve #305 and #306.
- It is probably not necessary to have a 60+ MB
protein_location_sce.tsvfile with annotation data. Please simplify this table. - The function of
bound0.xlsxis unclear: what is meant with "bounds restricted to 0", and how is this used to curate the model? - Tables should be in flat-text (TSV or similar) format, not Excel (
bound0.xlsx). - It is unclear where the localization data is coming from. are there any cut-offs used? Are reactions copied or moved? Is there manual curation?
The same comments can be made for #315, but probably best to fix this for one organelle first.
Revisiting I think I now understand partially what this aimed to do:
protein_location_sce.tsvis a list with GO term annotation of unknown source- If a gene is annotated in that file to be located in the lipid droplet, then the associated reaction is copied into the lipid droplet compartment.
- I don't understand
bound0.xlsx, are these supposedly reactions that are located in the lipid droplet but they should not be? In that case, it is probably the wrong file, because the reactions are located in the peroxisome instead.
Regardless, curation of compartments is still useful, but should be handled differently:
- Have a trustworthy source of compartment information. Current source is unknown.
- There are papers with lipid droplet proteomics, this would be valuable input data to use.
- Regardless, just assigning proteins to different compartments based only on high-throughput data is not sensible. Look at the whole functional pathways: are whole pathways located in the lipid droplet or not? Maybe some additional reactions should be copied, to get functional pathways.
I will close this PR because it does not follow a sensible strategy to do this curation. Please feel free to open an Issue to discuss how to implement this curation, based on what data and which reactions should be changed.