eggnog-mapper icon indicating copy to clipboard operation
eggnog-mapper copied to clipboard

eggnog speed and missing files?

Open bernt-matthias opened this issue 2 years ago • 3 comments

I'm working on the Galaxy wrapper for the eggnog mapper. We observe very low CPU usage (~1% for each of the 10 cores). While debugging I noticed approx 4-5000 failed attempts per minute to read two files:

stat("/gpfs1/data/galaxy_server/galaxy/tool-data/eggnog_data/5.0.2/eggnog.db-journal", 0x7ffe9631c020) = -1 ENOENT (No such file or directory) 
stat("/gpfs1/data/galaxy_server/galaxy/tool-data/eggnog_data/5.0.2/eggnog.db-wal", 0x7ffe9631c020) = -1 ENOENT (No such file or directory) | 

Wondering what these files are and if this is a problem of the mapper it self or maybe the Galaxy wrapper implementation?

We are using version 2.1.8

Note: I'm exploring in parallel all the possibilities mentioned in the wiki (in particular potting the whole DB to memory gives a massive speedup)...

bernt-matthias avatar Jul 25 '22 12:07 bernt-matthias

Hi @bernt-matthias ,

To be honest I don't know how to help you, since I never used the Galaxy wrapper of eggnog-mapper. Maybe it is something related to slow access to the hard drive, which is making the processes bound to I/O leaving CPUs free, but sincerely I don't know. Loading the DB into memory makes it faster yes. You can tell emapper to load the DB into memory using the flag --dbmem.

Hopefully, someone else will help you.

Best, Carlos

Cantalapiedra avatar Jul 26 '22 07:07 Cantalapiedra

Hi @Cantalapiedra thanks for the reply. In the end the Galaxy tool wrapper just creates a command line and executes it.

Would it help if I try to get the generated command lines? Maybe we can reproduce it independent of the Galaxy tool...

bernt-matthias avatar Jul 26 '22 08:07 bernt-matthias

So, in this case, do you have control on the command lines which are running? Can you modify them? Also, could you test it in a different hard drive? As I told you, it is not uncommon that the annotation is slow if the DB is not loaded into memory and the drives are slow, or if they are NFS.

Yes of course, if you want send the command lines. Maybe we can get some clue from them.

Cantalapiedra avatar Jul 26 '22 09:07 Cantalapiedra