NOVOPlasty icon indicating copy to clipboard operation
NOVOPlasty copied to clipboard

Question about speed of assembly

Open RyanGawryluk opened this issue 4 years ago • 3 comments

Hi,

Firstly, I'm really appreciating this software, so thanks a lot for developing it!

I have a question about large variations in speed of NOVOPlasty that I'm experiencing. I am assembling the mitochondrial genomes of several novel microbes, at ~60-70 kbp, from PE Illumina 150 x 2 reads. In each case, I normalized the read sets with bbnorm to make the datasets more manageable, and they now have a total of ~60 million pairs for each species.

For the first species, I ran NOVOPlasty, and it figured out a nice, circular mtDNA in about 20 minutes. For the next species, it has been running for > 24 hours, with no end in sight. In the log file of each, I set a kmer length of 31, and a max memory of 15 g (though from the 'top' command, it looks to be using more like 100g). These are likely fairly similar mtDNAs and datasets sizes; what could be causing the huge difference in run time, and what's the best way of getting around this?

Thanks!

RyanGawryluk avatar Dec 08 '20 18:12 RyanGawryluk

Hi,

If it takes that long it could be a bug... Could you set extended log to 1 in the config to 1 and run it again. And send me that file

Do you see the length of the assembled sequence increasing?

And make sure you use the latest version..

ndierckx avatar Dec 08 '20 18:12 ndierckx

Ok, thanks I will update to the latest version and run it again with the extended log file.

I did see the length increasing initially (to 96,410 bp, a bit higher than expected). But since then, I can't tell that anything is really happening.

RyanGawryluk avatar Dec 08 '20 18:12 RyanGawryluk

After it gets to that length it got stuck because of a bug, so you can terminate the assembly, it will not finish. You can also terminate it after the length stops increasing for the extended log run

ndierckx avatar Dec 08 '20 18:12 ndierckx