gapseq icon indicating copy to clipboard operation
gapseq copied to clipboard

Reactions that have good blast but not included in the draft model

Open gmhhope opened this issue 2 years ago • 1 comments

Dear Gapseq developer,

I am super excited about this software. It does improve CarveMe and ModelSeed draft models a lot and it provides a long list of tentative reactions, which provides transparency in tracking on the procedure! So thanks very much!

I recently attempts to translate the genome-scale model (the formalized knowledgebase of metabolic reactions in a bio-organism(s)) to MS-based metabolomics application. Thus, the coverage of the reference reactions and compound databases are essential to enable such application, as metabolomics aims at mapping not only essential metabolisms but secondary metabolisms.

I tested gapseq and found that there are a significant number of reactions reported in the output (~30,000 rxns), which outnumbers the reactions that are retained in the draft model (~2500 rxns). Is there any descriptions of how the software compiles and filters the reactions? Esp. why a significant of results of no_blast present in the table? Are those included no-blast-derived reactions from a particular pathway predicted to present in the model based on essential reactions identified?

Furthermore, I found some reactions that have a good-blast but not included in the draft model. For example, reactions_with_good_blast_not_in_model.txt

I did not find the reaction in ModelSeed database but in MetaCyc database. I am not sure if your reference network includes MetaCyc as well?

I have a lot more questions but I think this will be a good start to understand more about the software. Thanks very much for any assistance! And I was amazed by your software! Thanks!

Best, Minghao Gong

gmhhope avatar May 15 '22 03:05 gmhhope

Hi Minghao Gong,

thank you for you questions and for using gapseq!

As pathway references, gapseq mainly uses the pathways described in MetaCyc. The pathways IDs are stated in the column pathway in the ...-Reactions.tbl. The MetaCyc pathway definitions contain the MetaCyc reactions IDs that are giving in the column rxn. gapseq links the MetaCyc-Reaction IDs to the reactions IDs from the gapseq reactions database. Unfortunately, this mapping is not perfect, leaving a few metacyc-IDs unlinked to the gapseq reaction DB. This explains the results you provided in the file reactions_with_good_blast_not_in_model.txt. If there are hits, the column dbhit provides the reaction IDs that refer to the gapseq reaction DB and/or the ModelSEED reaction DB.

The gapseq internal reaction and compound database is derived from ModelSEED. Not all reaction entries from ModelSEED are included in the gapseq reaction database. We for instance excluded a number of duplicated or erroneous reactions. Thus, a number of hit-IDs stated in dbhit are not part of the gapseq biochemistry database and these reactions do not occur in gapseq models (draft and gapfilled models).

why a significant of results of no_blast present in the table? Are those included no-blast-derived reactions from a particular pathway predicted to present in the model based on essential reactions identified?

Yes. reactions with the label "no_blast" or "no seq data" can still be part of the draft network if these reactions participate in a pathway that was predicted to be present based on the completeness threshold and/or key-enzyme criteria.

Best Silvio

Waschina avatar Jun 14 '22 11:06 Waschina