Range violation
Hello, I have a rather difficult problem. I am aligning several wgs samples, each about 100gb filesize, and for one of them bwa-mem2 avx just stops at 17gb without any error. bwa-mem2 sse4.1 stops at the same position with the following error.
[0000] 3. Calling kt_for - worker_sam
[0000][ M::mem_process_seqs] Processed 5868668 reads in 2703.295 CPU sec, 32.876 real sec
[0000] read_chunk: 880000000, work_chunk_size: 880000219, nseq: 5868866
[0000][ M::kt_pipeline] read 5868866 sequences (880000219 bp)...
[0000] Calling mem_process_seqs.., task: 34
[0000] 1. Calling kt_for - worker_bwt
core.exception.RangeError@BioD/bio/std/hts/bam/read.d(226): Range violation
Unfortunately I cannot provide a minimum example. If I reduce the fastq files to just a million around this "critical point" it runs perfectly fine and produces the expected values. I am willing to debug something, if you would like me to.
Best Christo
ps: bwa-mem (the normal old version) is able to process the sample without any errors.
The error message is raised by the BioD package, bwa itself doesn't read or write bam files. Although you'd expect the surrounding tools to behave the same are you sure the issue is with bwa-mem2 itself and not something feeding in to or reading from bwa?
I will rerun and pipe to /dev/null to see what happens
If you have sufficient space I'd recommend the following while determining if BWA is the source of the issue
bwa-mem .... | gzip -c -2 > sam.gz
Then you can also see if the issue remains in the rest of the flow via:
zcat sam.gz | <other tools>
This may also help identify if an individual read is malformed by slicing up that file later. Possibly also worth determining if the remainder of the flow works with the samtools implementations of the downstream processing (if possible).
I used pigz to speed up the compression process. The command was
(bwa-mem2 mem -t 88 -R '@RG\tID:1875-0_Blood\tSM:1875-0_Blood\tPL:ILLUMINA' resources/genome.fasta results/merged/1875-0_Blood_R1.fastq.gz results/merged/1875-0_Blood_R2.fastq.gz | pigz -c > 1875-0_Blood.sam.gz) 2> 1875-0_Blood.log
I counted
zcat 1875-0_Blood.sam.gz | wc -l
and got 195,759,052 lines.
A count of the bwa aligned file
sambamba view -c 1875-0_Blood.bwa.bam
shows 2,239,533,143 aligned reads (including split map secondary reads).
The 1875-0_Blood.log tail:
[0000] Calling mem_process_seqs.., task: 33
[0000] 1. Calling kt_for - worker_bwt
[0000] 2. Calling kt_for - worker_aln
[0000] Inferring insert size distribution of PE reads from data, l_pac: 3099750718, n: 5868668
[0000][PE] # candidate unique pairs for (FF, FR, RF, RR): (205, 2675344, 7, 190)
[0000][PE] analyzing insert size distribution for orientation FF...
[0000][PE] (25, 50, 75) percentile: (617, 933, 1604)
[0000][PE] low and high boundaries for computing mean and std.dev: (1, 3578)
[0000][PE] mean and std.dev: (1075.27, 744.14)
[0000][PE] low and high boundaries for proper pairs: (1, 4565)
[0000][PE] analyzing insert size distribution for orientation FR...
[0000][PE] (25, 50, 75) percentile: (284, 349, 421)
[0000][PE] low and high boundaries for computing mean and std.dev: (10, 695)
[0000][PE] mean and std.dev: (353.79, 106.57)
[0000][PE] low and high boundaries for proper pairs: (1, 832)
[0000][PE] skip orientation RF as there are not enough pairs
[0000][PE] analyzing insert size distribution for orientation RR...
[0000][PE] (25, 50, 75) percentile: (775, 1103, 1737)
[0000][PE] low and high boundaries for computing mean and std.dev: (1, 3661)
[0000][PE] mean and std.dev: (1194.15, 693.38)
[0000][PE] low and high boundaries for proper pairs: (1, 4623)
[0000][PE] skip orientation FF
[0000][PE] skip orientation RR
[0000] 3. Calling kt_for - worker_sam
[0000][ M::mem_process_seqs] Processed 5868668 reads in 1847.586 CPU sec, 22.723 real sec
[0000] read_chunk: 880000000, work_chunk_size: 880000219, nseq: 5868866
[0000][ M::kt_pipeline] read 5868866 sequences (880000219 bp)...
[0000] Calling mem_process_seqs.., task: 34
[0000] 1. Calling kt_for - worker_bwt
and tail of the sam.gz:
H3NGLALXX:5:23:4769918:0 99 2 25903692 60 150M = 25904068 502 TTTCCTTATTAAAAGTTCCAAATATAGGCCAGGCGCGGTGGCTCATGCCTATAATGCCAGCACTTTGGGAGGTCAAGGCAGATGGATCACGAGGTGAGCTGTTCGAGACCAGCCTGGCTAACATGATGAAATCCTATCTCTACTAAAACT AAAFFKKFKAAKKKKKKKKKKKKKKKKKKKKKK<AKKKKAKAKKKKKKFK<FK<FKKFKKFKKA<FKKK<FFAAAAKFKKKAFKKFKFFKAAF<A,<7FAK,,,<<<K7A<7<,7AFFAFA<77AA,A,,,A<A7,,,,,,,,,<<KA,, NM:i:2 MD:Z:95C52A1 MC:Z:24S126M AS:i:143 XS:i:54 RG:Z:1875-0_Blood
H3NGLALXX:5:23:4769918:0 147 2 25904068 60 24S126M = 25903692 -502 AAAGCAGTACAATGGGCGAGCCGGGGTGGCTCACGCCTATAATCCCACCGCTTTGGGAGGTCAAGGCAGGGGGATCACCTGAGGTCAGGAGTTCAAGACCAGCCTGGCCAACCCATCTCTACTAAAAATACAAAAATTAGCCACGTGTGG ,,,,,7,,<,,,(,,,,(,(,,,(7,<FAFKAA<,,,,,,AA7,<(,,,,,,,<(,<A,,,,,,(77,(,,AAKAFKFAA<KF<,KF7AFKF,FKKFKFA7,AFKAKAA<,7,,FF77A,KA,KKKFKF<FKKKKF7<7A7F7,,,,A,< NM:i:5 MD:Z:11T2C10A20T72G6 MC:Z:150M AS:i:101 XS:i:76 RG:Z:1875-0_Blood
H3NGLALXX:5:7:3873948:0 99 2 25903858 60 150M = 25904071 353 GAATGGTGGTGCACGCCTGTAATCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCACCTGAACCCAGGAGGTGGAGGTTGCAGTGGGCCGAGACTGTACCACTGCACTCCAGCCTGGACAACAGAGTGAGACTCCGTCTCAAAAAAAAA <AFFFAAFKK<,FKK<AKKFKFKKKKKKFKKKFFKKKKKFFFKKFF7A7FFA<FKKK7FKKKKKKKAAAKKKAAFA,AAFFF,AFAAKKFFAAFFKK<FKKFK7FKKKKKKFKKFFAF<F<KKKF<F<F<F<A77FAAKAKKK<AK<<FF NM:i:1 MD:Z:66G83 MC:Z:10S140M AS:i:145 XS:i:78 RG:Z:1875-0_Blood
H3NGLALXX:5:7:3873948:0 147 2 25904071 60 10S140M = 25903858 -353 CGAGGCTTGGGGCTCACGCCTCGAATCCCGCCACTTTGGGAGGTCAAGGCAGGTGGATCACCGGAGGTGAGGAGTTCTAGACCAGCCTGGCCAACCGATCTCTGCTAAAAATACAAAAATTAGCCAGGTGTGGTGGCATGCGCCTGTAGT ,,7<,,,(((,A<,,,(,,,,,,,,,<7(,A7,<,,,FFAAFAKA,,<A,F7<7<7A<AA<<,,,<,,,AFA<FFA,,7A,7FAA<(7FFA7,7A7,,7KKA<,FF,<7,F<<AAK7FKA7AF7FFAA,KKKKKKKF7((7A<A,7,,,, NM:i:8 MD:Z:8T3T6A32T5C8A18C6A46 MC:Z:150M AS:i:100 XS:i:64 RG:Z:1875-0_Blood
H3NGLALXX:1:23:4404940:0 83 2 25904072 60 150M = 25903867 -355 GCTCACGTCTCTAATCCCACCACTTTGGGAGGTCAAGGCAGGTGGATCACCTGAGGTCAGGAGTTCAAGACCAGCCTGGCCAACCCATCTCTACTAAAAATACAAAAATTAGCCAGGTGTGGTGGCATGCGCCTGTAGTCCTAGCTACTC KKKFA<FAFFFKAKAFKKKKF<FK<<<KKKAKKKFKFKKKKKF<AKK<AKKKKKKKKKKKKKKKKKKKK<KKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKFKKKKKKKKKAKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
Please note, that the second read of H3NGLALXX:1:23:4404940:0 is missing. You probably ask yourself how the second read looks like. Here the read plus the read before and after.
@H3NGLALXX:5:7:3873948:0/2
ACTACAGGCGCATGCCACCACACCTGGCTAATTTTTGTATTTTTAGCAGAGATCGGTTGGCCAGGCTGGTCTAGAACTCCTCACCTCCGGTGATCCACCTGCCTTGACCTCCCAAAGTGGCGGGATTCGAGGCGTGAGCCCCAAGCCTCG
+
,,,,7,A<A7((7FKKKKKKK,AAFF7FA7AKF7KAA<<F,7<,FF,<AKK7,,7A7,7AFF7(<AAF7,A7,,AFF<AFA,,,<,,,<<AA<A7<7<7F,A<,,AKAFAAFF,,,<,7A,(7<,,,,,,,,,(,,,<A,(((,,,<7,,
@H3NGLALXX:1:23:4404940:0/2
TGCACGCCTGTAATCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCACCTGAACCCAGGAGGTGGAGGTTGCAGTGGGCCGAGACTGTACCACTGCACTCCAGCCTGGACAACAGAGTGAGACTCCGTCTCAAAAAAAAAAAAAAAAAA
+
AAFFFKKKKKKKKKKKKKKKKKKKKKKKFKKKKKKKKKKKKKKFKKKKKKKKKKKKKKKKKKKFKKFKKKKKKKKKKKKKKF<FAFKKKKKKKKKKKFK<FKFKKFKAF<FKAFFKKKAAAFAFAFFK,,7A,AA<<77<,<A77F7A7A
@H3NGLALXX:4:2:1918254:0/2
TTAGCCGGGAATGGTGGTGCACGCCTGTAATCTCAGCTACTCAGGAGGCTGAGGCAGGAGAATCACCTGAACCCAGGAGGTGGAGGTTGCAGTGGGCCGAGACTGTACCACTGCACTCCAGCCTGGACAACAGAGTGAGACTCCGTCTCA
+
A<,FF7FFK7AFKKKKKKKKFKKFFKKKK7FFAKKKKKKKKA7FKFKKKKKKKKKKKKKK7AAFAKKKFKKFKKFF<AAFAKKFFK,AFFAF7AAKFFFFFF<<KFKK<FA<KF7<AFFKKKF<<7KFK<KKKFK<AAFKK,AAKA7,A,
Can you run with -v 4:
-v INT verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3]
The last read of the sam.gz file is missing the AUX tags, a possible place to investigate for the devs.
(/dev/null for bwa stdout)
Running, but it will take a lot of time. The debug mode is very slow.
Another way you can check for undefined behaviour is to compile with -fsanitize=address. This is a bit faster but may catch if the issue is caused by a bad memory read, you can breakpoint on __asan_on_error@plt to catch the error before it writes over your stack. The final alternative is valgrind which should work on sse41 but is rather slow.
I dont even know what that means OO
here is tail -n from the debug log 1875-0_Blood.v4.tail.log
@christopher-schroeder are you still facing the problem? What is the memory capacity of this system? can you try and check with smaller number of threads?
yes, the error remains. I had no choice and switched back to bwa mem for the project and everything was mapped and downstream analysis are currently running. The memory capacity of the system is 756gb, that should be no issue. Sure, what number do you have in mind @yuk12? 44?
The available memory is sufficient to run mem2. You can start with 16 threads. The log shows that it is hanged in the seeding stage. Also, you can try the latest commit, where index size is reduced significantly, and see.
I tried 16 cores with the newest commit and it worked. I increase it to 32 now.
with 32 threads a get the following error:
bwa-mem2: src/bwamem.cpp:904: int mem_kernel1_core(FMI_search*, const mem_opt_t*, bseq1_t*, int, mem_chain_v*, mem_seed_t*, int64_t, mem_cache*, int): Assertion `num_smem < *wsize_mem' failed.
@christopher-schroeder Do you have any updates for this issue? I am also running into the same problem.
I dont know the technical background but I now used -K 500000000 as parameter for a few month on 100+ whole genomes and never got this error again. So I guess this is at least a workaround. @skchronicles
@christopher-schroeder Thank you for the quick reply! I appreciate. I will let you know if this also fixes our issue too.
@skchronicles Did it work?