NanoSim
NanoSim copied to clipboard
Error when FASTQ read headers are too long
Not the NanoSim bug as such, but very relevant for Nanopore data. Samtools limits the read header to 254 characters (https://github.com/samtools/samtools/issues/10810). NanoSim (v3.2.2) doesn't seem to check that call to minimap2 in "read_analysis.py" complete without error. So when the input read file has read headers with >254 characters, samtools fails silently and the NanoSim continues running and throws error:
./output/test_genome_alnm.bam
Traceback (most recent call last):
File "/home/lshas17/miniforge3/envs/nanosimENV/bin/read_analysis.py", line 896, in <module>
main()
File "/home/lshas17/miniforge3/envs/nanosimENV/bin/read_analysis.py", line 606, in main
alnm_ext, unaligned_length, strandness, unaligned_base_qualities = align_genome(in_fasta, prefix, aligner,
File "/home/lshas17/miniforge3/envs/nanosimENV/bin/read_analysis.py", line 199, in align_genome
unaligned_length, strandness, unaligned_base_quals = get_primary_sam.primary_and_unaligned(g_alnm, prefix, quantification, fastq=fastq)
File "/home/lshas17/miniforge3/envs/nanosimENV/bin/get_primary_sam.py", line 188, in primary_and_unaligned
strandness = float(pos_strand) / num_aligned
ZeroDivisionError: float division by zero
Adding check of return code to calls to minimap2 (for example on line 171 in "read_analysis.py") will help users fix the problem with their data.
Hi @AntonS-bio
Thanks for flagging this and for the detailed explanation — much appreciated. Your proposed solution is a smart and practical workaround. We'll make sure to address this more cleanly in a future release.