eggnog-mapper
eggnog-mapper copied to clipboard
Annotation of metagenomic reads is very slow
Hello,
I am trying to use EggNOG to annotate gene function for metagenomic reads. I am following the instructions for doing the homology search separately from the annotation step. The homology search with DIAMOND is fast, but the annotation is very slow (if using the /dev/shm
trick).
If I understand correctly, it looks like every query undergoes annotation separately, even if many queries aligned to the same seed ortholog. In other words, the number of lookups is equal to the number of entries in the emapper.seed_ortholog
file, rather than the number of unique seed orthologs in that file. Is this correct?
I would think that doing just one annotation lookup for every unique seed ortholog would dramatically speed up this step.
Please let me know if my understanding is correct and if there is currently a way to do this.
Thank you, Lev
Hi @levlitichev ,
Definitely, the annotation step is slower than the homology search. However, using either --dbmem or /dev/shm it should not be so "slow". I guess it depends on your expectations.
Regarding the algorithm, you are right. If you expect to have many hits to the same seed ortholog, it would be faster to annotate them once. However, when you have most hits to different seed orthologs, I am not sure it is worth it to manage this. It looks like we would need to store into memory either the query-seed associations, or the seed annotations, to later assign them to the queries.
If your case is the former, you may of course reduce the .seed_orthologs file, and then transfer the annotations to all the queries afterwards. I am afraid that there is no way to do this automatically with eggnog-mapper.
It would be an interesting feature to implement and test, though.
Best, Carlos
Got it. Thanks very much for your response.