gapseq icon indicating copy to clipboard operation
gapseq copied to clipboard

Reaction inferred from pseudogene regions when using gapseq on genome fasta file.

Open ArnaudBelcour opened this issue 7 months ago • 0 comments

Hello,

Technical part

While working on the reconstruction of metabolic networks from public genomes of the NCBI database, I have found that gapseq (version 1.2 with subcommand doall) uses region of the genome that are tagged as pseudogene during reaction inference. I identified this by using the genome sequence as input to gapseq and by comparing (here by searching for overlap) the region predicted to be associated with a reaction to the genes present in the GenBank file of the organism at the same location. When the corresponding gene has a pseudo qualifier, I considered that the reaction was associated with a pseudogene.

There are a lot of variations, some species have no matches with pseudogene regions and other have hundreds of reactions associated with these regions. It seems logical to find them, as pseudogene regions still contain some sequences similar to the ones of functional genes that can match when tblasting them. In my previous team, we encounter a similar issue when developing the method AuCoMe.

Do you think it could be possible to identify and label these reactions as associated with pseudogenes? Or at least put a warning when using genome sequence file as input?

Thoughts on pseudogenes and metabolism

For me this raises the question of whether taking into account these regions or not. Because, yes, they have been identified (often automatically) as pseudogene regions but these predictions could be taken with caution. Especially for two points:

So I think it could be interesting to label these reaction as they can show (1) a loss of (or inactive) function, (2) a modification of this function but that can still be performed or (3) a future potential active function. But maybe they should not be present in the model that will be used to make prediction (such as with Flux Balance Analysis) due to the uncertainty about them?

Best regards, Arnaud Belcour.

ArnaudBelcour avatar Nov 22 '23 11:11 ArnaudBelcour