GECKO icon indicating copy to clipboard operation
GECKO copied to clipboard

refactor: update general kcat database

Open edkerk opened this issue 3 years ago • 6 comments
trafficstars

Description of the new feature:

By repurposing some of the DLKcat code, construct a JSON file similar as in DLKcat. This JSON contains kcat values from BRENDA and SABIO-RK databases, and could be run every 6-12 months.

In contrast to the current DLKcat code, some filtering steps should be skipped, to (a) keep specific activities; (b) keep cases where no amino acid sequence could be assigned to. By doing so, this kcat database can be used in GECKO's fuzzy matching approach. The file thereby replaces the max_Kcat and max_SA files that GECKO currently uses.

This JSON file should be loaded into MATLAB, to be used in GECKO fuzzy matching (which is refactored in a separate Issue).

However, if the purpose of this file is only used for GECKO's fuzzy matching, then we might as well stick with the existing max_Kcat and max_SA files? Let's keep this Issue on hold for now.

edkerk avatar May 25 '22 21:05 edkerk

I'm not totally sure on the scope of the issue: is it about creating a local kcat file or about how that file is used by GECKO? Perhaps it would be easier to discuss verbally and write up conclusions.

mihai-sysbio avatar Jun 27 '22 05:06 mihai-sysbio

Both about the actual file and how it is used. But it sounds like GotEnzymes will also be able to provide such a file (did not know this when opening this issue), so this issue will most likely be addressed when GECKO and GotEnzymes can work together.

edkerk avatar Jun 27 '22 12:06 edkerk

Does not strictly have to be JSON, but it should be flat-text so that it is diff-able.

edkerk avatar Jul 01 '22 07:07 edkerk

For reference, this is what GotEnzymes output looks like (showing only the first 28 lines here):

{"enzymes":[{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04770","ec_number":"4.4.1.1;4.4.1.11","compound":"C00109","kcat_values":1.1646},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04770","ec_number":"4.4.1.1;4.4.1.11","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R09366","ec_number":"4.4.1.1;4.4.1.13","compound":"C05703","kcat_values":1.5564},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R00782","ec_number":"4.4.1.1;4.4.1.13;4.4.1.28","compound":"C00022","kcat_values":0.7785},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R09366","ec_number":"4.4.1.1;4.4.1.13","compound":"C00022","kcat_values":0.7785},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R02408","ec_number":"4.4.1.1;4.4.1.13;4.4.1.35","compound":"C01962","kcat_values":2.2991},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04930","ec_number":"4.4.1.1","compound":"C00109","kcat_values":1.1646},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R00782","ec_number":"4.4.1.1;4.4.1.13;4.4.1.28","compound":"C00283","kcat_values":2.2484},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R09366","ec_number":"4.4.1.1;4.4.1.13","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R02408","ec_number":"4.4.1.1;4.4.1.13;4.4.1.35","compound":"C00022","kcat_values":0.7785},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04930","ec_number":"4.4.1.1","compound":"C05699","kcat_values":1.3478},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R09366","ec_number":"4.4.1.1;4.4.1.13","compound":"C05689","kcat_values":1.3304},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04930","ec_number":"4.4.1.1","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04770","ec_number":"4.4.1.1;4.4.1.11","compound":"C05703","kcat_values":1.5564},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R02408","ec_number":"4.4.1.1;4.4.1.13;4.4.1.35","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R02408","ec_number":"4.4.1.1;4.4.1.13;4.4.1.35","compound":"C00491","kcat_values":1.0792},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R01001","ec_number":"4.4.1.1","compound":"C02291","kcat_values":0.476},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04770","ec_number":"4.4.1.1;4.4.1.11","compound":"C05335","kcat_values":1.4882},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R01001","ec_number":"4.4.1.1","compound":"C00014","kcat_values":4.2309},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R01001","ec_number":"4.4.1.1","compound":"C00109","kcat_values":1.1646},
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R01001","ec_number":"4.4.1.1","compound":"C00097","kcat_values":0.1704}
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R00782","ec_number":"4.4.1.1;4.4.1.13;4.4.1.28","compound":"C00097","kcat_values":0.1704}
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R00782","ec_number":"4.4.1.1;4.4.1.13;4.4.1.28","compound":"C00014","kcat_values":4.2309}
{"gene":"YAL012W","organism":"sce","domain":"E","ko":"K01758","reaction_id":"R04930","ec_number":"4.4.1.1","compound":"C05688","kcat_values":0.7746}
{"gene":"YAL023C","organism":"sce","domain":"E","ko":"K00728","reaction_id":"R11399","ec_number":"2.4.1.109","compound":"C03862","kcat_values":5.6405}
{"gene":"YAL023C","organism":"sce","domain":"E","ko":"K00728","reaction_id":"R11399","ec_number":"2.4.1.109","compound":"C00110","kcat_values":5.1653}
{"gene":"YAL023C","organism":"sce","domain":"E","ko":"K00728","reaction_id":"R04072","ec_number":"2.4.1.109","compound":"C03862","kcat_values":5.6405}
{"gene":"YAL023C","organism":"sce","domain":"E","ko":"K00728","reaction_id":"R04072","ec_number":"2.4.1.109","compound":"C00110","kcat_values":5.1653}

edkerk avatar Jul 02 '22 09:07 edkerk

Just a note, the above is without having exposed a purpose-built API. Therefore, my recommendation would be to see that as a foundation to build a more compact and potentially more usable response.

mihai-sysbio avatar Jul 03 '22 08:07 mihai-sysbio

Here is the link to the promised API https://metabolicatlas.org/api/v2/#/GotEnzymes

mihai-sysbio avatar Aug 12 '22 17:08 mihai-sysbio