NanoSim icon indicating copy to clipboard operation
NanoSim copied to clipboard

Error when FASTQ read headers are too long

Open AntonS-bio opened this issue 6 months ago • 1 comments

Not the NanoSim bug as such, but very relevant for Nanopore data. Samtools limits the read header to 254 characters (https://github.com/samtools/samtools/issues/10810). NanoSim (v3.2.2) doesn't seem to check that call to minimap2 in "read_analysis.py" complete without error. So when the input read file has read headers with >254 characters, samtools fails silently and the NanoSim continues running and throws error:

./output/test_genome_alnm.bam
Traceback (most recent call last):
  File "/home/lshas17/miniforge3/envs/nanosimENV/bin/read_analysis.py", line 896, in <module>
    main()
  File "/home/lshas17/miniforge3/envs/nanosimENV/bin/read_analysis.py", line 606, in main
    alnm_ext, unaligned_length, strandness, unaligned_base_qualities = align_genome(in_fasta, prefix, aligner,
  File "/home/lshas17/miniforge3/envs/nanosimENV/bin/read_analysis.py", line 199, in align_genome
    unaligned_length, strandness, unaligned_base_quals = get_primary_sam.primary_and_unaligned(g_alnm, prefix, quantification, fastq=fastq)
  File "/home/lshas17/miniforge3/envs/nanosimENV/bin/get_primary_sam.py", line 188, in primary_and_unaligned
    strandness = float(pos_strand) / num_aligned
ZeroDivisionError: float division by zero

Adding check of return code to calls to minimap2 (for example on line 171 in "read_analysis.py") will help users fix the problem with their data.

AntonS-bio avatar May 06 '25 14:05 AntonS-bio

Hi @AntonS-bio

Thanks for flagging this and for the detailed explanation — much appreciated. Your proposed solution is a smart and practical workaround. We'll make sure to address this more cleanly in a future release.

saberhq avatar May 13 '25 19:05 saberhq