GECKO icon indicating copy to clipboard operation
GECKO copied to clipboard

feat: universal DLKcat.tsv for both light and full ecModels

Open Yu-sysbio opened this issue 2 years ago • 8 comments
trafficstars

https://github.com/SysBioChalmers/GECKO/blob/1710a9861a60249207d547585cc96581e0ece18b/tutorials/protocol.m#L106

I ran this successfully when building a full ecModel, but got the error below when building a light ecModel:

Error using readDLKcatOutput (line 56) Not all reactions from DLKcat output can be found in model.ec.rxns

Yu-sysbio avatar Feb 28 '23 07:02 Yu-sysbio

Did you use the same DLKcat output file to make both a light and full ecModel?

edkerk avatar Feb 28 '23 09:02 edkerk

Maybe it because here:

https://github.com/SysBioChalmers/GECKO/blob/1710a9861a60249207d547585cc96581e0ece18b/src/geckomat/gather_kcats/readDLKcatOutput.m#L55

it compares the rxn identifiers but in light version, the identifiers in model.ec.rxns have a prefix 001_r_001

ae-tafur avatar Feb 28 '23 09:02 ae-tafur

Indeed, this check is to make sure that the kcatList can later be used by selectKcatValue, which just matches by reaction ID: https://github.com/SysBioChalmers/GECKO/blob/1710a9861a60249207d547585cc96581e0ece18b/src/geckomat/gather_kcats/selectKcatValue.m#L51

Short term solution is to have the error suggest to make full/light specific versions of the DLKcat.tsv file. Longer term solution is to have selectKcatValue not just check by the ec.rxns style reaction identifier, but instead removes any suffices/prefixes (so returns to the ecModel.rxns format, and also matches to the enzymes that are annotated. Or something similar, at least not only matching by reaction identifier. This will require some more refactoring and testing.

edkerk avatar Feb 28 '23 10:02 edkerk

Did you use the same DLKcat output file to make both a light and full ecModel?

Yes

Yu-sysbio avatar Feb 28 '23 12:02 Yu-sysbio

I just notice that the DLKcat output files generated by full and light protocols are not interchangeable due to rxn IDs in these two types of ecModels. I do not think that it is very necessary to make them interchangeable by refactoring code. Instead I prefer the short-term solution, and we should clarify this in the protocol.

Yu-sysbio avatar Feb 28 '23 12:02 Yu-sysbio

Mentioned in the protocol. Longer term goal would be to enhance the parsing of the file, to avoid this.

edkerk avatar Feb 28 '23 20:02 edkerk

Instead of mapping via the reaction ID, perhaps we could use another ID as a unique identifier for the sequence, and do the mapping back to the reaction within Matlab. From a modularity point of view, the current DLKcat.tsv mixes the concerns of GECKO and DLKcat. Was there a specific reason for avoiding gene IDs?

mihai-sysbio avatar Jul 03 '23 16:07 mihai-sysbio

No reason to avoid gene ID, that is actually my plan, to map by the core of the reaction ID (so without prefixes and suffixes from light and full ecModels) and the gene ID. And it's a GECKO-only concern, DLKcat doesn't care about identifiers.

edkerk avatar Jul 03 '23 16:07 edkerk