minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

A corrupt FASTQ that causes minimap2 to create bogus SAM

Open diekhans opened this issue 3 months ago • 0 comments

A FLAIR user had a confusing invalid UTF8 decode error with the process of a SAM file from minimap2. It turned out that the FASTQ used as input was actually a gzip-ed tar file of the FASTQ. I could only make this happen with the first few hundred bytes of the users FASTQ. Attempts to create a similar test case didn't produce the same results.

To reproduce with the attached data:

% samtools faidx target.fa 
% minimap2 -a -t 64 -N 4 --MD target.fa IBWF_barcode20.fastq.gz >broken.sam

% python -c 'for l in open("broken.sam"): pass'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 360: invalid start byte

In general, it would be safer for minimap2 to generate an error than continue with an invalid FASTQ.

mm2-bogus-fastq.tar.gz

diekhans avatar Sep 06 '25 00:09 diekhans