ra icon indicating copy to clipboard operation
ra copied to clipboard

long run time for ra

Open wingwingWY opened this issue 7 years ago • 5 comments

Hi I ran RA for a plant genome with genome size 1G and 50X nanopore reads. The program has been running for 1700cpu hours and still running. The log last update was 6 days ago.
How many cpu hours will ra take for an 1G genome? Is there any way to increase speed?

The log file is as follows:

...
[Ra::run] layout stage
[rala::Graph::initialize] loaded sequences
[rala::Graph::initialize] loaded overlaps
[rala::Graph::preprocess] dataset coverage median = 337
[rala::Graph::preprocess] processed chimeric sequences
[rala::Graph::initialize] number of prefiltered sequences = 453302
[rala::Graph::initialize] elapsed time = 1132638.85608 s

wingwingWY avatar Dec 12 '18 10:12 wingwingWY

Hello, can you please paste here the commands you run for compilation and execution? Also, can you please check in ra_work_directory_<timestamp>/ the size of the .paf file?

Best regards, Robert

rvaser avatar Dec 12 '18 10:12 rvaser

Hi, My commands is : ra -t 40 -x ont all_fastq.gz > ra.assembly.out 2>ra.assembly.log

The size of the paf file is 5T.

wingwingWY avatar Dec 18 '18 02:12 wingwingWY

Did you run cmake with Releade flag? Still, the majority of execution time will be lost on 5T paf parsing :/

rvaser avatar Dec 18 '18 06:12 rvaser

I had successfully assembled a 100M genome with Ra. I think Ra is installed correctly. I install Ra as follows: cd ra mkdir build cd build cmake -DCMAKE_BUILD_TYPE=Release .. make

wingwingWY avatar Dec 20 '18 08:12 wingwingWY

Well, the current version of ra parses the overlap file several times, which will take quite long for large files. We are working on a workaround.

rvaser avatar Dec 20 '18 08:12 rvaser