flo icon indicating copy to clipboard operation
flo copied to clipboard

flo failed on Large genome

Open pan-genome opened this issue 3 years ago • 9 comments

flo failed on a 14Gb genome, with "corrupted double-linked list (not small)" error. it runs normally with genome smaller than 4Gb in size. The setting is on an aws m5.16xlarge EC2 instance.

rake -f /home/ubuntu/flo/Rakefile & mkdir run cp /home/ubuntu/s.fa run/source.fa cp /home/ubuntu/t.fa run/target.fa faToTwoBit run/source.fa run/source.2bit faToTwoBit run/target.fa run/target.2bit twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes faSplit sequence run/target.fa 21 run/chunk_ parallel --joblog run/joblog.faSplit -j 21 -a run/joblst.faSplit Academic tradition requires you to cite works you base your article on. When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent. If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

123322 pieces of 123923 written 133957 pieces of 134763 written 150983 pieces of 152743 written 156478 pieces of 157558 written 98419 pieces of 99073 written 99082 pieces of 99724 written 103154 pieces of 103663 written 113555 pieces of 113991 written 118767 pieces of 119728 written 123551 pieces of 124526 written 141741 pieces of 142672 written 144495 pieces of 146237 written 130388 pieces of 131310 written 147572 pieces of 148896 written 138549 pieces of 140111 written 141907 pieces of 142961 written 149246 pieces of 150844 written 149613 pieces of 150822 written 197774 pieces of 198899 written 160747 pieces of 162550 written 167525 pieces of 170389 written parallel --joblog run/joblog.blat -j 21 -a run/joblst.blat Academic tradition requires you to cite works you base your article on. When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; and it won't cost you a cent. If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

corrupted double-linked list (not small) free(): invalid next size (normal) free(): invalid next size (normal) double free or corruption (!prev) double free or corruption (!prev) malloc(): smallbin double linked list corrupted free(): invalid next size (normal) malloc(): memory corruption free(): invalid next size (normal) double free or corruption (!prev) free(): invalid next size (normal) double free or corruption (!prev) double free or corruption (!prev) rake aborted! Command failed with status (21): [parallel --joblog run/joblog.blat -j 21 -a...] /home/ubuntu/flo/Rakefile:153:in parallel' /home/ubuntu/flo/Rakefile:99:in block in <top (required)>' /home/ubuntu/flo/Rakefile:37:in `block in <top (required)>' Tasks: TOP => run/liftover.chn (See full trace by running task with --trace)

[1]+ Exit 1 rake -f /home/ubuntu/flo/Rakefile

pan-genome avatar Sep 18 '20 15:09 pan-genome

Not sure if the error is coming from GNU parallel or blat. The contents of run/joblog.blat can help decide. Would you mind posting it?

If it's GNU parallel, you could try using a newer version. The version that the install script installs is quite old.

If it's blat, it is possible that 256 GB is not sufficient memory for the task. Did you monitor the memory usage using htop? You could try lowering the number of parallel processes that flo runs, use a memory optimised (r5) instance for more RAM, and take steps to minimise memory usage of blat, such a create and provide an ooc file.

yeban avatar Sep 20 '20 15:09 yeban

here is blat joblog: run$ cat joblog.blat Seq Host Starttime JobRuntime Send Receive Exitval Signal Command 6 : 1600440618.782 60.508 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_05.fa run/chunk_05.fa.psl 9 : 1600440618.787 67.600 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_01.fa run/chunk_01.fa.psl 5 : 1600440618.780 74.621 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_14.fa run/chunk_14.fa.psl 21 : 1600440618.807 81.061 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_20.fa run/chunk_20.fa.psl 4 : 1600440618.778 186.198 0 0 0 9 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_07.fa run/chunk_07.fa.psl 10 : 1600440618.788 312.954 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_10.fa run/chunk_10.fa.psl 2 : 1600440618.775 312.980 0 41 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_03.fa run/chunk_03.fa.psl 8 : 1600440618.785 314.005 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_09.fa run/chunk_09.fa.psl 13 : 1600440618.793 314.322 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_08.fa run/chunk_08.fa.psl 14 : 1600440618.795 314.361 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_17.fa run/chunk_17.fa.psl 20 : 1600440618.805 314.427 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_02.fa run/chunk_02.fa.psl 12 : 1600440618.791 319.748 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_13.fa run/chunk_13.fa.psl 7 : 1600440618.783 324.924 0 48 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_18.fa run/chunk_18.fa.psl 11 : 1600440618.790 327.304 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_04.fa run/chunk_04.fa.psl 15 : 1600440618.796 330.322 0 28 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_00.fa run/chunk_00.fa.psl 19 : 1600440618.803 331.255 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_11.fa run/chunk_11.fa.psl 17 : 1600440618.800 332.427 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_15.fa run/chunk_15.fa.psl 18 : 1600440618.802 332.598 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_19.fa run/chunk_19.fa.psl 16 : 1600440618.798 333.617 0 35 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_06.fa run/chunk_06.fa.psl 1 : 1600440618.774 341.095 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_12.fa run/chunk_12.fa.psl 3 : 1600440618.777 345.338 0 34 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_16.fa run/chunk_16.fa.psl

pan-genome avatar Sep 20 '20 16:09 pan-genome

I was wondering what would be the best way to update parallel, do I install an new version or update the one in /ext/parallel-20150722? if installed new one in different folder, I then need to point all the parallel in flo to the new src.

pan-genome avatar Sep 20 '20 16:09 pan-genome

if installed new one in different folder, I then need to point all the parallel in flo to the new src

Best to install in new folder. You can tell flo about the new folder using :add_to_path: key in the config file.

yeban avatar Sep 20 '20 16:09 yeban

changed to r5.16xlarge and used a new parallel, lower the parallel from 21 to 10 and still get the same error. any suggestion? Thanks! The blatlog looks like below: run$ cat joblog.blat Seq Host Starttime JobRuntime Send Receive Exitval Signal Command 5 : 1600620779.307 255.175 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_01.fa run/chunk_01.fa.psl 7 : 1600620779.310 255.858 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_08.fa run/chunk_08.fa.psl 1 : 1600620779.302 256.565 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_03.fa run/chunk_03.fa.psl 8 : 1600620779.311 256.630 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_00.fa run/chunk_00.fa.psl 10 : 1600620779.314 256.855 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_02.fa run/chunk_02.fa.psl 2 : 1600620779.303 257.506 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_07.fa run/chunk_07.fa.psl 4 : 1600620779.306 257.615 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_09.fa run/chunk_09.fa.psl 6 : 1600620779.308 257.718 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_04.fa run/chunk_04.fa.psl 9 : 1600620779.312 258.359 0 0 0 6 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_06.fa run/chunk_06.fa.psl 3 : 1600620779.304 258.777 0 0 0 11 blat -noHead -noHead -fastMap -tileSize=12 -ooc=/home/ubuntu/efs/wheat/4461_12.ooc -minScore=100 -minIdentity=98 run/source.fa run/chunk_05.fa run/chunk_05.fa.psl

pan-genome avatar Sep 20 '20 17:09 pan-genome

Sorry, I am not quite sure what is happening here. I have not encountered this error before. From the information we have in this thread, it might as well be a bug in blat. It might be worth trying to run the blat commands listed in joblst.blat one by one to check if all the chunks fail with the above error, or one in particular. With an isolated example it might then be worth asking on blat's mailing list.

Just to be sure, is it possible that the ooc file you constructed is using a different tileSize than what you are using for running blat? I guess not, because you have _12 suffix on the ooc file.

Did you compile blat yourself or did you download pre-compiled executable (e.g., using the install script)? It is possible that a difference in glibc between your instance and the host on which blat was compiled. In which case, compiling blat yourself can help. But this is a kind of issue where you would be better off getting help on blat's mailing list.

I used flo on ~400 Mb genome, split into 40 chunks, so 10 Mb per chunk. I wonder if increasing the number of processes so that each chunk is smaller helps.

Lastly, I would quickly check the fasta and psl file for each chunk just to make sure we are not missing something too obvious.

yeban avatar Sep 26 '20 13:09 yeban

Hi How can you split the processes into more than the number of chromosomes/scaffolds? in the information page it says "Number of CPU cores to use (required - not auto detected). This cannot be greater than the number of scaffolds in the target assembly." here I have 21 chromosomes and 21 processes is the max I can get, and looks like it is a memory issue for blat and each chunk is still too big for blat to handle.

pan-genome avatar Sep 29 '20 20:09 pan-genome

here is what happened when I run blat on one chunk: blat -noHead -fastMap -tileSize=12 -ooc=4461n_12.ooc -minScore=100 -minIdentity=98 source.fa chunk_08.fa chunk_08.fa.psl Loaded 14547261565 letters in 22 sequences free(): invalid next size (normal) Aborted (core dumped)

pan-genome avatar Sep 30 '20 17:09 pan-genome

Hello ! I am facing a similar issue while running flo with a large genome of ~16 Gb size. Can you please advise if there is a work-around/solution for this issue. Thanks.

akshaya-v avatar Mar 16 '22 15:03 akshaya-v