eggnog-mapper icon indicating copy to clipboard operation
eggnog-mapper copied to clipboard

Missing genes in the EggNOG mapper output

Open enushi opened this issue 4 years ago • 5 comments

My fasta file has 3600 genes but the output file after running the annotation tool has only 3300 genes. Why is it not the case that there is a query for every gene in the fasta file?

enushi avatar Mar 30 '22 11:03 enushi

Hi @enushi ,

sorry for the delay answering. It is difficult to answer you without looking at the data and results. Are you using the online tool or command line?

Anyway, often not all the genes are annotated, and those without annotation are not included in the output files.

Best, Carlos

Cantalapiedra avatar Apr 18 '22 09:04 Cantalapiedra

Hi, Thanks for the reply, I am using the online tool and my fasta file is the following: sequence.txt

The weird thing is that I also get not annotated genes in the output files, they are marked by "-", so we can't say that the reason it removes them is because they don't have a COG annotation. Here is my output file: MM_nznp8mei.emapper.annotations.xlsx

best regards, Elio

enushi avatar Apr 18 '22 10:04 enushi

Hi @enushi ,

Unless you have evidence that some input protein should be annotated but it is not, due to a bug, your output file looks completely normal to me. There are roughly 92% of proteins with annotations, which, if I am not mistaken, it is not bad at all in comparison with other data sets. Note that all the queries in your output file have annotations (at least a hit to a seed ortholog). A portion of them may lack some or most of the annotation fields. It depends on each particular input sequence.

I hope this makes sense.

Best, Carlos

Cantalapiedra avatar Apr 19 '22 08:04 Cantalapiedra

Hi, Is there a way I can get a list of queries that were not present in output? maybe as a separate file? P.S: I am using online tool. Best, Garima

garima-setia avatar Feb 21 '24 22:02 garima-setia

Hi @garima-setia ,

Sorry for the delay answering. I will tag this post as a feature request, since so far we don't provide a separate list of queries without annotation. For linux users this should be rather easy to get (e.g. using the join command). However, it would be a useful feature to provide this, specially for users with not so strong linux programming knowledge (but also convenient for all, I guess).

Best, Carlos

Cantalapiedra avatar May 11 '24 11:05 Cantalapiedra