Missing genes in the EggNOG mapper output
My fasta file has 3600 genes but the output file after running the annotation tool has only 3300 genes. Why is it not the case that there is a query for every gene in the fasta file?
Hi @enushi ,
sorry for the delay answering. It is difficult to answer you without looking at the data and results. Are you using the online tool or command line?
Anyway, often not all the genes are annotated, and those without annotation are not included in the output files.
Best, Carlos
Hi, Thanks for the reply, I am using the online tool and my fasta file is the following: sequence.txt
The weird thing is that I also get not annotated genes in the output files, they are marked by "-", so we can't say that the reason it removes them is because they don't have a COG annotation. Here is my output file: MM_nznp8mei.emapper.annotations.xlsx
best regards, Elio
Hi @enushi ,
Unless you have evidence that some input protein should be annotated but it is not, due to a bug, your output file looks completely normal to me. There are roughly 92% of proteins with annotations, which, if I am not mistaken, it is not bad at all in comparison with other data sets. Note that all the queries in your output file have annotations (at least a hit to a seed ortholog). A portion of them may lack some or most of the annotation fields. It depends on each particular input sequence.
I hope this makes sense.
Best, Carlos
Hi, Is there a way I can get a list of queries that were not present in output? maybe as a separate file? P.S: I am using online tool. Best, Garima
Hi @garima-setia ,
Sorry for the delay answering. I will tag this post as a feature request, since so far we don't provide a separate list of queries without annotation. For linux users this should be rather easy to get (e.g. using the join command). However, it would be a useful feature to provide this, specially for users with not so strong linux programming knowledge (but also convenient for all, I guess).
Best, Carlos