drep icon indicating copy to clipboard operation
drep copied to clipboard

dRep taking much longer than estimated using FastANI

Open jdwinkler-lanzatech opened this issue 2 years ago • 2 comments

Hi,

I am dereplicating about 5,000 MAGs using dRep 3.3.0 with the new default FastANI genome comparison algorithm. dRep initially estimated a pairwise alignment phase of about 160 minutes, but FastANI has been chugging away using 64 threads for about 10 hours now. Is that to be expected? I am dereplicating a lot of closely related MAGs but I anticipated it being a little faster than this.

Thanks again for all your work on dRep!

jdwinkler-lanzatech avatar Jul 06 '22 01:07 jdwinkler-lanzatech

Hi @jdwinkler-lanzatech -

dRep's timing estimate is just an estimate and unfortunately it can be very off. In general the less clusters there are the faster dRep goes (comparing lots of little primary clusters takes longer in reality than comparing a few huge clusters, but dRep will still estimate the same time for each), and the timing doesn't actually scale linearly with cores unfortunately (so dRep's timing estimate will think it'll run 64x faster with 64 cores vs 1 core, but in reality it won't be able to use 64 cores all the time; this is especially true if you have lots of little primary clusters).

I tend to run dRep with 16 cores, and for 5000 genomes I would expect it to take 12 - 24 hours. I'm not sure about the particulars of your job, but it could be the case that 64 cores only provides a minor increase in speed over 16 cores.

Hope this helps and let me know if you have follow up questions!

-Matt

MrOlm avatar Jul 06 '22 02:07 MrOlm

Thanks! That is about what I figured; I've implemented a similar estimate for a primary clustering approach in another tool that suffers from the same time/sequence complexity. I'll keep an eye on the job in case it takes a significant amount of extra time beyond your estimate here.

jdwinkler-lanzatech avatar Jul 06 '22 02:07 jdwinkler-lanzatech