bwa bwa bwasw crashed with the error '[vfprintf(stdout)] Value too large for defined data type'

bwa bwasw crashed with the error '[vfprintf(stdout)] Value too large for defined data type'. Version: 0.7.15-r1142

I used 'bwa bwasw ' to compare two assembly reference. The contig size maybe too large for bwasw, but it should not crash(coredump).

+ bwa bwasw -t 144 /ssd/biowrk/TAIR/canu.eval/ref.fa /ssd/biowrk/TAIR/canu.min4000/tair.contigs.fasta
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[bsw2_aln] read 85 sequences/pairs (121314957 bp) ...
[vfprintf(stdout)] Value too large for defined data type

Mar 13 '17 04:03 wangyugui

I got the same error. I wonder if you managed to figure out why this happens?

Apr 10 '17 12:04 aallahyar

I got the same error. I also wonder if you managed to figure out why this happens...

Oct 30 '20 01:10 dexon9109

I got the same error. I also wonder if you managed to figure out why this happens...

Hey @dexon9109 It has been more than 3 years, so I have a vague memory of this. This could have been because I was trying to map a very long sequence (>10kb) and the system specifications (or BWA itself) could not handle it. Please see if you are doing so and report back. It might be of use to somebody else in the future.

Oct 30 '20 06:10 aallahyar

Hello @aallahyar , I did match a longer contig to nanopore subreads.And this error didn't throw a specific sequence or position, so I couldn't really locate the specific sequence.The commands to run are as follows:

bwa bwasw -b 5 -q 2 -r 1 -T 15 -z 10 -t 30 rg.fa nanopore.pass.fq.gz -f test.bam

Oct 30 '20 06:10 dexon9109

I am a bit confused as nanopore does not have subreads? PacBio has.

In case the long read is the problem, I can also share that my read had a very low complexity i.e. ratio between frequency of observed A, T, G and C. Unfortunately, I am not the developer, so I can not help further. My suggestion is to chop up your reads to smaller pieces step by step and see when you can successfully map. I would be surprised if you really need such a lengthy read to successfully find a location in the genome. Of course that is not a perfect solution, sorry about that.

Good luck!

Oct 30 '20 07:10 aallahyar

Sorry @aallahyar about word "Subreads".I thought long reads could be called that, but I won't be calling them that anymore. In fact, this lengthy read contained multiple contig, and I wanted to locate multiple contig through this read.So it's a long enzymatic-connection read.Your method is worth trying.I also ran a minimap2 : )

Oct 30 '20 07:10 dexon9109

Sorry @aallahyar about word "Subreads".I thought long reads could be called that, but I won't be calling them that anymore. In fact, this lengthy read contained multiple contig, and I wanted to locate multiple contig through this read.So it's a long enzymatic-connection read.Your method is worth trying.I also ran a minimap2 : )

OK, I have a better picture now. Then you are saying that the read itself is concatenation of multiple "fragments" that are produced through enzyme digestion? Then you can scan the read and split it once you found the enzyme recognition sequence. Then map each individual fragments. Due to Nanopore sequencing errors, this is not perfect. But it works much better than mapping the entire read at once. I hope it helps.

Oct 30 '20 09:10 aallahyar

bwa bwa copied to clipboard

bwa bwasw crashed with the error '[vfprintf(stdout)] Value too large for defined data type'

bwa
bwa copied to clipboard