minimap2
minimap2 copied to clipboard
A corrupt FASTQ that causes minimap2 to create bogus SAM
A FLAIR user had a confusing invalid UTF8 decode error with the process of a SAM file from minimap2. It turned out that the FASTQ used as input was actually a gzip-ed tar file of the FASTQ. I could only make this happen with the first few hundred bytes of the users FASTQ. Attempts to create a similar test case didn't produce the same results.
To reproduce with the attached data:
% samtools faidx target.fa
% minimap2 -a -t 64 -N 4 --MD target.fa IBWF_barcode20.fastq.gz >broken.sam
% python -c 'for l in open("broken.sam"): pass'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 360: invalid start byte
In general, it would be safer for minimap2 to generate an error than continue with an invalid FASTQ.