long run time for ra
Hi
I ran RA for a plant genome with genome size 1G and 50X nanopore reads.
The program has been running for 1700cpu hours and still running. The log last update was 6 days ago.
How many cpu hours will ra take for an 1G genome? Is there any way to increase speed?
The log file is as follows:
...
[Ra::run] layout stage
[rala::Graph::initialize] loaded sequences
[rala::Graph::initialize] loaded overlaps
[rala::Graph::preprocess] dataset coverage median = 337
[rala::Graph::preprocess] processed chimeric sequences
[rala::Graph::initialize] number of prefiltered sequences = 453302
[rala::Graph::initialize] elapsed time = 1132638.85608 s
Hello,
can you please paste here the commands you run for compilation and execution? Also, can you please check in ra_work_directory_<timestamp>/ the size of the .paf file?
Best regards, Robert
Hi, My commands is : ra -t 40 -x ont all_fastq.gz > ra.assembly.out 2>ra.assembly.log
The size of the paf file is 5T.
Did you run cmake with Releade flag? Still, the majority of execution time will be lost on 5T paf parsing :/
I had successfully assembled a 100M genome with Ra. I think Ra is installed correctly. I install Ra as follows: cd ra mkdir build cd build cmake -DCMAKE_BUILD_TYPE=Release .. make
Well, the current version of ra parses the overlap file several times, which will take quite long for large files. We are working on a workaround.