racon icon indicating copy to clipboard operation
racon copied to clipboard

expected run time

Open devonorourke opened this issue 7 years ago • 53 comments

Hi Ivan, I started my first Racon run a few days ago. Around the 48 hour mark, I started to wonder how things were still progressing - I observed the following information in the huge (17 G!).log file:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Polisher::initialize] aligned overlap 275776394/275776394
[racon::Polisher::initialize] transformed data into windows
generated consensus for window 301901/4005087

Monitoring the cpu usage and memory at 48 hours suggested things were still quite active. Yet now it's been another 48 hours (about 4 days since I launched the job), and the above .log file message is identical:

[racon::Polisher::initialize] aligned overlap 275776394/275776394
[racon::Polisher::initialize] transformed data into windows
generated consensus for window 301901/4005087

In addition, and somewhat curious, the amount of memory consumed is identical as it was 2 days ago.

Given the number of overlaps, perhaps things are still in progress, but given that the generated consensus for window value hasn't changed in two days makes me concerned that something is amiss.

Thanks for your help

devonorourke avatar Aug 29 '18 19:08 devonorourke

Hello,

which version of Racon and which type of data are you using for polishing?

Best regards, Robert

rvaser avatar Aug 29 '18 19:08 rvaser

Version 1.3.1

Using Nanopore assembly (assembled by Flye) with 150 PE Illumina data for polishing. I mapped reads with minimap.

The command:

RACON=/mnt/lustre/macmaneslab/devon/bin/racon
DRAFT=/mnt/lustre/macmaneslab/devon/batgenomes/assembly/flye/myse/rough/contigs.fasta
OVERLAPS=/mnt/lustre/macmaneslab/devon/batgenomes/alignments/myse/minimap2/SE4racon.sam

$RACON --threads 24 tmp.fq $OVERLAPS $DRAFT

Thanks for the quick response!

On Wed, Aug 29, 2018, 3:53 PM Robert Vaser [email protected] wrote:

Hello,

which version of Racon are you using and which type of data are you using for polishing?

Best regards, Robert

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/isovic/racon/issues/86#issuecomment-417083750, or mute the thread https://github.com/notifications/unsubscribe-auth/AKqgXPZpr6dykZXNYclnhEifei4QY9ipks5uVvFTgaJpZM4WSMDB .

devonorourke avatar Aug 29 '18 20:08 devonorourke

This is odd because the infinite loop bug caused by a large amount of reads mapped to a window has been fixed in version 1.3.1 and your description indicates that it is not (or there is another bug). Are you certain that the version of racon you are running is 1.3.1, i.e. you ran /mnt/lustre/macmaneslab/devon/bin/racon --version?

rvaser avatar Aug 29 '18 20:08 rvaser

Yep, that's exactly what I did to identify the version. Please let me know what other info to give you. Thanks again

On Wed, Aug 29, 2018, 4:15 PM Robert Vaser [email protected] wrote:

This is odd because the infinite loop bug caused by a large amount of reads mapped to a window has been fixed in version 1.3.1 and your description indicates that it is not (or there is another bug). Are you certain that the version of racon you are running is 1.3.1, i.e. you ran /mnt/lustre/macmaneslab/devon/bin/racon --version?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/isovic/racon/issues/86#issuecomment-417090314, or mute the thread https://github.com/notifications/unsubscribe-auth/AKqgXPnCpSgaMY9IHFcMfYYS5IkT0cwYks5uVvZIgaJpZM4WSMDB .

devonorourke avatar Aug 29 '18 20:08 devonorourke

What is the last git commit you have (run git log)?

rvaser avatar Aug 29 '18 20:08 rvaser

This what you're looking for?

commit 57aabc5366e30be8ebe9e13d71824af91b9b9cba
Author: rvaser <[email protected]>
Date:   Mon Apr 23 22:04:31 2018 +0200

On Wed, Aug 29, 2018 at 4:50 PM Robert Vaser [email protected] wrote:

What is the last git commit you have (run git log)?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/isovic/racon/issues/86#issuecomment-417101179, or mute the thread https://github.com/notifications/unsubscribe-auth/AKqgXDz8k1h_YzGChxjhDsZ2RK6a6muSks5uVv6YgaJpZM4WSMDB .

-- Devon O'Rourke Graduate student in Molecular and Evolutionary Systems Biology University of New Hampshire

devonorourke avatar Aug 29 '18 20:08 devonorourke

Just ran git pull origin master and saw this:

[devon@premise racon]$ git pull origin master
remote: Counting objects: 9, done.
remote: Total 9 (delta 7), reused 7 (delta 7), pack-reused 2
Unpacking objects: 100% (9/9), done.
From https://github.com/isovic/racon
 * branch            master     -> FETCH_HEAD
Updating 57aabc5..f5af1c7
Fast-forward
 src/sequence.cpp | 23 ++++++++++++++---------
 vendor/spoa      |  2 +-
 2 files changed, 15 insertions(+), 10 deletions(-)

I wonder if I can restart the run without having to calculate all the overlaps again?

devonorourke avatar Aug 29 '18 20:08 devonorourke

Yeah, that explains it. Recompile with the new commits and run polishing again. It should work now :)

rvaser avatar Aug 29 '18 20:08 rvaser

I take it run polishing again means just run the whole thing over?

devonorourke avatar Aug 29 '18 20:08 devonorourke

Yes, just the racon part though.

rvaser avatar Aug 29 '18 20:08 rvaser

Got it. Will send you an update. Many thanks for the promptness!

devonorourke avatar Aug 29 '18 21:08 devonorourke

Odd observation - the previous version was able to load all the overlaps with less than 450 G of memory. It's crashing now before the overlaps are finished. Did memory allocation/usage change substantially in the newest version? Maybe I need to switch to correcting by scaffold... Any insight is greatly appreciated. Is there documentation for the wrapper script? Cheers

devonorourke avatar Sep 03 '18 16:09 devonorourke

That is really odd. It crashes during overlap parsing due to memory shortage?

rvaser avatar Sep 03 '18 16:09 rvaser

Not clear if it's truly a memory issue, but I think so. I was watching top as Racon completed it's first two tasks to load target sequences (draft assembly?) and then sequences (reads?). I allocated 450 Gb RAM for the job, so I'm assuming when top says I've used about 60%, it's about that fraction of 450. The raw reads, tmp.fq in the error message below use up 114 Gb of disk space.

Nothing special about the script I submitted - launched it in the directory where tmp.fq resides...

RACON=/pathTo/bin/racon
DRAFT=/pathTo/contigs.fasta
OVERLAPS=/pathTop/SE4racon.sam

"$RACON" --threads 24 tmp.fq "$OVERLAPS" "$DRAFT"

The error message then appears about an hour into the job:

$ cat racon.log
[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
/var/spool/slurmd/job53009/slurm_script: line 17: 94377 Killed                  "$RACON" --threads 24 tmp.fq "$OVERLAPS" "$DRAFT"

What's odd is the memory usage was only at something like 75% when I was watching top, and the job crashed about 5 minutes after I last checked. At that point the second part - loading sequences - was still happening.

Is there a --verbose option to maybe timestamp the progress in the log file?

Thanks again

devonorourke avatar Sep 03 '18 17:09 devonorourke

I wonder about the racon wrapper option...

What's the disadvantage of breaking up there polishing into chunks? Less effective?

I had to do this with pilon so perhaps I'll have to do it here also.

devonorourke avatar Sep 03 '18 17:09 devonorourke

You allocated 450 GB of RAM the last time as well? The commits you have pulled fixed only the last step in racon, i.e. where POA graphs are built and consensus is called, and that step should not allocate much memory.

Unfortunately, there is no --verbose option.

The wrapper script splits the contigs into several files depending on your desired size (you can pass the size of the longest contig plus some epsilon) and runs racon on each of those files sequentially. The disadvantage is that it will read the whole read set and overlap set several time thus increase the execution time. Run --help to see how to run it.

rvaser avatar Sep 03 '18 17:09 rvaser

What are the file sizes you gave racon as input?

rvaser avatar Sep 03 '18 17:09 rvaser

Reads (tmp.fq) = 114 G Draft (contigs.fasta) = 1.9 G Mappings (.sam) = 176 G

On Mon, Sep 3, 2018, 1:53 PM Robert Vaser [email protected] wrote:

What are the file sizes you gave racon as input?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/isovic/racon/issues/86#issuecomment-418169460, or mute the thread https://github.com/notifications/unsubscribe-auth/AKqgXAhiEaqBb0AItCPNDQETPm55fPElks5uXWykgaJpZM4WSMDB .

devonorourke avatar Sep 03 '18 18:09 devonorourke

That should fit into 450GB. Were there any other processes run in parallel? You can also try mapping reads to the backbone without the alignment (run minimap2 without -a) to further decrease the file size (and memory consumption).

rvaser avatar Sep 03 '18 18:09 rvaser

Okay, good to know. I can request up to 990 Gb of RAM at the moment, so that's what's currently waiting in queue. I just wondered if I was doing something wrong, or miscalculation some how. Thanks

On Mon, Sep 3, 2018, 2:22 PM Robert Vaser [email protected] wrote:

That should fit into 450GB. Were there any other processes run in parallel? You can also try mapping reads to the backbone without the alignment (run minimap2 without -a) to further decrease the file size.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/isovic/racon/issues/86#issuecomment-418173121, or mute the thread https://github.com/notifications/unsubscribe-auth/AKqgXDfUnMmN9P5bxWLFfyG84F4pn-Hbks5uXXNngaJpZM4WSMDB .

devonorourke avatar Sep 03 '18 18:09 devonorourke

Unfortunately the 1Tb node we have on our cluster continues to be occupied by another job.

While it was waiting, I reinstalled Racon 1.3.1; git log sugggests the last commit was f5af1c7857fd2f434216a3eed349bf453c2a1c43. I also installed the wrapper script.

I ran the wrapper script as follows:

RACON=/path/to/my/racon_wrapper
DRAFT=/path/to/my/contigs.fasta
OVERLAPS=/path/to/my/overlaps.sam
READS=/path/to/my/illumina.fq

"$RACON" --split 100000000000 --threads 24 "$READS" "$OVERLAPS" "$DRAFT"

This launched the script, splitting my draft fasta file (DRAFT) into two chunks. Because I allocated 450 G of RAM, all the target sequences, sequences, and overlaps were loaded. In addition, all the aligned overlaps appeared to be loaded.

Unlike the previous Racon run where it stalled, this script completed the polishing step for the first of two windows:

[devon@premise myse]$ head raconW.log 
[RaconWrapper::run] preparing data with rampler
[RaconWrapper::run] processing data with racon
[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Polisher::initialize] aligned overlap 167412327/167412327
[racon::Polisher::initialize] transformed data into windows
[racon::Polisher::polish] generated consensus for window 2396936/2396936
>contig_1 LN:i:194537 RC:i:26396 XC:f:1.000000

However, the script failed to then move to the next of two fasta files. The log file terminates with a print out of one polished contig, but it appears truncated.

What's confusing me is that this one fasta file may represent just 1 of the ~10,000 contigs in my draft assembly. But the temporary directory Racon created to split my DRAFT was in two chunks (within those two chunks there are many fasta reads; and those two chunks contain all of my expected number of contigs).

What does that first contig represent? What is the typical output behavior? Is everything at the end of a successful Racon run within that single .log file?

Thanks for your assistance

devonorourke avatar Sep 07 '18 09:09 devonorourke

What do you mean by truncated contig? The log tells me that racon finished polishing the first file but it is odd that it did not start on the second. Are there any error messages?

Please check number of contigs in first and second file. It could be that the first file contains only one contig or other contigs have not been printed because there was not enough coverage for polishing (you can keep them with the -u option). Still, I do not know why it stopped after the first file.

rvaser avatar Sep 07 '18 12:09 rvaser

Hi Robert, Thanks for the reply. I think one of the things that was tripping me up was how to view the .log file. I have been viewing the contents with commands like head because less doesn't open up the file properly (there are a bunch of ^M symbols for line breaks. Because I was using head as the command to view the information, I was able to print out the log file that I showed earlier, and the reason I was thinking that the file was truncated was ... because I was using head! I decided to try to parse the .log file to collect any fasta records with:

cat racon.log | grep '^>' -A 1 > racon.fasta

And fortunately that produced a fasta file with 12298 sequences, and an estimated size of about 1.2 Gb. I believe this is the first of two chunks of my polished genome, because the input file which was split is about 2.0 Gb in size, and contains 17268 contigs.

I have no idea why the program crashed, but it looks like there is no indication in the log file about any error. I'm going to run the same commands once again and see if it crashes once more.

Can you please comment if I am generating the output properly? I'm surprised that the polished fasta file is part of the log file - I would have anticipated it being part of a separate output.

Thanks very much

devonorourke avatar Sep 07 '18 16:09 devonorourke

Look at the last few lines of your log file and paste them here please.

Contigs are passed to stdout while the log and error messages are printed to stderr. You should pipe them into two separate files.

rvaser avatar Sep 07 '18 17:09 rvaser

I wonder if I'm making a mess of this because the job is submitted to our cluster through SLURM, and I believe that by passing the #SBATCH --output parameter, I'm putting all the output into that single .log file?

The full shell script looks like:

#!/bin/bash

#SBATCH --job-name="racon"
#SBATCH --mem 450Gb
#SBATCH --cpus-per-task=24
#SBATCH --output racon.log

racon_wrapper --split 100000000000 --threads 24 reads.fq racon.sam contigs.fasta

How would you recommend I split the information?

devonorourke avatar Sep 07 '18 17:09 devonorourke

Well I am not familiar with SLURM but a fast google search suggests to run #SBATCH --error=racon.log and #SBATCH --output=contigs.fasta. Try running that with a small data set to verify if it will work correctly (you can find some in racon/test/data). :)

rvaser avatar Sep 07 '18 17:09 rvaser

Just tested using the sample data and passing the #SBATCH --error racon.log and #SBATCH --output racon.fasta commands work as hoped for.

Next question is whether or not the script will parse the pair of reference fasta files (split from a single reference fasta) or not. Won't know that answer for another 12 hours or so, but I'll let you know what happens next.

Thanks for all the input!

devonorourke avatar Sep 07 '18 17:09 devonorourke

Well it should parse both files. :D You can try greping "loaded target sequences" in your log file, there should be two occurrences.

rvaser avatar Sep 07 '18 17:09 rvaser

What I can confirm after 5 minutes of the program running is that:

  1. A new directory (racon_work_directory_1536342564.47) containing two files (contigs_0.fasta and contigs_1.fasta) has been created.
  2. contigs_0.fasta and contigs_1.fasta contain the same (total) number of sequences (12389 and 4879, respectively) as what is present in the draft assembly they are split from (17268).
  3. The log file thus far has only indicated that target sequences have been loaded once:
[RaconWrapper::run] preparing data with rampler
[RaconWrapper::run] processing data with racon
[racon::Polisher::initialize] loaded target sequences

You mentioned that the issue with using the racon_wrapper feature is that the sequences are uploaded for each split contig_X.fasta, correct? If that's correct, would I expect the .log file to look something like this (I've intentionally spaced apart where the second contig would print out the information below the blocks of text)?:

[RaconWrapper::run] preparing data with rampler
[RaconWrapper::run] processing data with racon
[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Polisher::initialize] aligned overlap 167412327/167412327
[racon::Polisher::initialize] transformed data into windows
[racon::Polisher::polish] generated consensus for window 2396936/2396936


[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Polisher::initialize] aligned overlap 100000/100000
[racon::Polisher::initialize] transformed data into windows
[racon::Polisher::polish] generated consensus for window 50000/50000

devonorourke avatar Sep 07 '18 18:09 devonorourke

Yes, the log file should look something like that.

rvaser avatar Sep 07 '18 21:09 rvaser