GECKO
GECKO copied to clipboard
feat: universal DLKcat.tsv for both light and full ecModels
https://github.com/SysBioChalmers/GECKO/blob/1710a9861a60249207d547585cc96581e0ece18b/tutorials/protocol.m#L106
I ran this successfully when building a full ecModel, but got the error below when building a light ecModel:
Error using readDLKcatOutput (line 56) Not all reactions from DLKcat output can be found in model.ec.rxns
Did you use the same DLKcat output file to make both a light and full ecModel?
Maybe it because here:
https://github.com/SysBioChalmers/GECKO/blob/1710a9861a60249207d547585cc96581e0ece18b/src/geckomat/gather_kcats/readDLKcatOutput.m#L55
it compares the rxn identifiers but in light version, the identifiers in model.ec.rxns have a prefix 001_r_001
Indeed, this check is to make sure that the kcatList can later be used by selectKcatValue, which just matches by reaction ID:
https://github.com/SysBioChalmers/GECKO/blob/1710a9861a60249207d547585cc96581e0ece18b/src/geckomat/gather_kcats/selectKcatValue.m#L51
Short term solution is to have the error suggest to make full/light specific versions of the DLKcat.tsv file. Longer term solution is to have selectKcatValue not just check by the ec.rxns style reaction identifier, but instead removes any suffices/prefixes (so returns to the ecModel.rxns format, and also matches to the enzymes that are annotated. Or something similar, at least not only matching by reaction identifier. This will require some more refactoring and testing.
Did you use the same DLKcat output file to make both a light and full ecModel?
Yes
I just notice that the DLKcat output files generated by full and light protocols are not interchangeable due to rxn IDs in these two types of ecModels. I do not think that it is very necessary to make them interchangeable by refactoring code. Instead I prefer the short-term solution, and we should clarify this in the protocol.
Mentioned in the protocol. Longer term goal would be to enhance the parsing of the file, to avoid this.
Instead of mapping via the reaction ID, perhaps we could use another ID as a unique identifier for the sequence, and do the mapping back to the reaction within Matlab. From a modularity point of view, the current DLKcat.tsv mixes the concerns of GECKO and DLKcat. Was there a specific reason for avoiding gene IDs?
No reason to avoid gene ID, that is actually my plan, to map by the core of the reaction ID (so without prefixes and suffixes from light and full ecModels) and the gene ID. And it's a GECKO-only concern, DLKcat doesn't care about identifiers.