minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

Error about 'write_sam_cigar'

Open slbai01 opened this issue 5 years ago • 11 comments

Hi,

I am using minimap2 to map PacBio long reads to genome.

Here is the command I ran:

/share/programs/minimap2-2.16_x64-linux/minimap2 --split-prefix prefix -t 9 -a ../01.reference_file/reference.fa ../00.raw_data/m54217_190219_070744.subreads.fasta.gz > m54217_190219_070744.subreads.fasta.gz.sam

This is error log file

[M::mm_idx_gen::12.231*1.67] collected minimizers
[M::mm_idx_gen::13.555*2.35] sorted minimizers
[M::main::13.556*2.35] loaded/built the index for 308 target sequence(s)
[M::mm_mapopt_update::14.270*2.28] mid_occ = 666
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 308
[M::mm_idx_stat::14.740*2.24] distinct minimizers: 35241018 (68.48% are singletons); average occurrences: 2.924; average spacing: 5.284
[M::worker_pipeline::122.180*7.93] mapped 59901 sequences
[M::worker_pipeline::222.485*8.43] mapped 60923 sequences
[M::worker_pipeline::316.092*8.61] mapped 61295 sequences
[M::worker_pipeline::412.197*8.71] mapped 60819 sequences
[M::worker_pipeline::506.022*8.77] mapped 60319 sequences
[M::worker_pipeline::599.978*8.81] mapped 60680 sequences
[M::worker_pipeline::694.081*8.84] mapped 61635 sequences
[M::worker_pipeline::791.363*8.84] mapped 60901 sequences
[M::worker_pipeline::868.383*8.85] mapped 50235 sequences
minimap2: format.c:378: write_sam_cigar: Assertion `clip_len[0] < qlen && clip_len[1] < qlen' failed.
/share/home/.lsbatch/1560592702.456838: line 8: 19388 Aborted                 (core dumped) /share/home/programs/minimap2-2.16_x64-linux/minimap2 --split-prefix prefix -t 9 -a ../01.reference_file/reference.fa ../00.raw_data/m54217_190219_070744.subreads.fasta.gz > m54217_190219_070744.subreads.fasta.gz.sam

I found code clip_len[0] < qlen && clip_len[1] < qlen in https://github.com/lh3/minimap2/blob/master/format.c, But my C language ability is weak and I can't locate the error correctly.

Thank you!

Shenglong

slbai01 avatar Jun 15 '19 13:06 slbai01

In fact, when I alignment the same files with blasr, I also encounter errors.

error message is:

[INFO] 2019-06-14T22:52:31 [blasr] started.
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_default_append
/share/home/.lsbatch/1560523950.454114: line 8: 18138 Aborted                 (core dumped) blasr ../00.raw_data/m54217_190219_070744.subreads.bam ../01.reference/reference.fa --bam --out m54217_190219_070744.bam --nproc 9 --sa ../01.reference/reference.fa.sa

I suspect it's a matter of input files. Although I did not edit any of the original sequencing files.

slbai01 avatar Jun 15 '19 13:06 slbai01

This is almost certain that your input file is corrupted. File transfer or disk issues can all cause file corruption. Minimap2 is admittedly weak on error checking.

lh3 avatar Jun 15 '19 13:06 lh3

Thank you for your reply.

Oddly enough, I just changed the parameters and got the results in the same file without any error.

/share/home/programs/minimap2-2.16_x64-linux/minimap2 -t 9 -ax map-pb ../01.reference_file/reference.fa ../00.raw_data/m54217_190219_070744.subreads.fasta.gz | samtools sort -@ 10 -o m54217_190219_070744.subreads.fasta.gz.sort.bam

I contacted the computer administrator and he said that there was no abnormal warning on the disk.

I also checked the MD5 value of the file. No problem.

slbai01 avatar Jun 15 '19 14:06 slbai01

I got the same error and the input files seem fine. Unfortunately, this is on a fairly large reference DB (48GB), so I cannot easily post an example. I can see this on both r2.16 and r2.17.

luispedro avatar Sep 09 '19 06:09 luispedro

On our input, PAF output works, SAM triggers the above mentioned error.

luispedro avatar Sep 11 '19 08:09 luispedro

@lh3 Same problem here, with a SAM output on simulated Illumina data. However if I don't use the --split-prefix flag, it works.

Example:

$ minimap2 -d index.mmi genomes/Agrobacterium_tumefaciens.fa
  • This doesn't work, and generate the Assertion failed: (clip_len[0] < qlen && clip_len[1] < qlen), function write_sam_cigar, file format.c, line 378.
$ minimap2 -a index.mmi --split-prefix prefix metagenome.1.trimmed.fastq metagenome.2.trimmed.fastq > test.sam
[M::main::0.101*0.99] loaded/built the index for 60 target sequence(s)
[M::mm_mapopt_update::0.117*0.99] mid_occ = 6
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 60
[M::mm_idx_stat::0.129*0.99] distinct minimizers: 1081087 (96.88% are singletons); average occurrences: 1.037; average spacing: 5.355
[M::worker_pipeline::0.134*1.01] mapped 743 sequences
[M::worker_pipeline::0.138*1.03] mapped 743 sequences
Assertion failed: (clip_len[0] < qlen && clip_len[1] < qlen), function write_sam_cigar, file format.c, line 378.
[1]    43653 abort      minimap2 -a index.mmi --split-prefix prefix metagenome.1.trimmed.fastq  >
  • This works:
$ minimap2 -a index.mmi metagenome.1.trimmed.fastq metagenome.2.trimmed.fastq > test.sam
[M::main::0.120*0.98] loaded/built the index for 60 target sequence(s)
[M::mm_mapopt_update::0.140*0.98] mid_occ = 6
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 60
[M::mm_idx_stat::0.153*0.98] distinct minimizers: 1081087 (96.88% are singletons); average occurrences: 1.037; average spacing: 5.355
[M::worker_pipeline::0.159*1.00] mapped 743 sequences
[M::worker_pipeline::0.163*1.02] mapped 743 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -a index.mmi metagenome.1.trimmed.fastq metagenome.2.trimmed.fastq
[M::main] Real time: 0.172 sec; CPU: 0.176 sec; Peak RSS: 0.042 GB
  • With --split-prefix and a .paf output, it works
$ minimap2 index.mmi --split-prefix prefix metagenome.1.trimmed.fastq metagenome.2.trimmed.fastq > test.paf
[M::main::0.100*1.01] loaded/built the index for 60 target sequence(s)
[M::mm_mapopt_update::0.116*1.00] mid_occ = 6
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 60
[M::mm_idx_stat::0.127*1.00] distinct minimizers: 1081087 (96.88% are singletons); average occurrences: 1.037; average spacing: 5.355
[M::worker_pipeline::0.130*1.01] mapped 743 sequences
[M::worker_pipeline::0.132*1.02] mapped 743 sequences
[M::worker_pipeline::0.145*1.01] mapped 1486 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 --split-prefix prefix index.mmi metagenome.1.trimmed.fastq metagenome.2.trimmed.fastq
[M::main] Real time: 0.145 sec; CPU: 0.147 sec; Peak RSS: 0.042 GB

minimap2 version: 2.17-r941 (installed via conda)

Test data:
test_data.zip

maxibor avatar Jun 17 '20 15:06 maxibor

Also tagging @hasindu2008

maxibor avatar Jun 19 '20 11:06 maxibor

@maxibor In your case, you are providing two FASTQ files as inputs. Are they paired-end reads?

As you do not provide -x sr when building the index, the two FASTQ files are mapped one after the other (which used to be an undocumented feature in minimap2) here. But, the --split-prefix option is not implemented for that undocumented feature and that is why the error. If you use -x sr, (and if it is the intended use-case), it does not fail. See below:

$ minimap2 -x sr genomes/Agrobacterium_tumefaciens.fa -d index.mmi
[M::mm_idx_gen::0.253*1.05] collected minimizers
[M::mm_idx_gen::0.301*1.35] sorted minimizers
[M::main::0.356*1.27] loaded/built the index for 60 target sequence(s)
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 60
[M::mm_idx_stat::0.367*1.28] distinct minimizers: 995979 (99.72% are singletons); average occurrences: 1.005; average spacing: 6.000
[M::main] Version: 2.12-r827
[M::main] CMD: minimap2 -x sr -d index.mmi genomes/Agrobacterium_tumefaciens.fa
[M::main] Real time: 0.376 sec; CPU: 0.469 sec

$ minimap2 -x sr -a index.mmi --split-prefix prefix metagenome.1.trimmed.fastq metagenome.2.trimmed.fastq > test.sam
[M::main::0.107*1.03] loaded/built the index for 60 target sequence(s)
[M::mm_mapopt_update::0.107*1.02] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 60
[M::mm_idx_stat::0.118*1.06] distinct minimizers: 995979 (99.72% are singletons); average occurrences: 1.005; average spacing: 6.000
[M::worker_pipeline::0.126*0.99] mapped 1486 sequences
[M::worker_pipeline::0.139*1.01] mapped 1486 sequences
[M::main] Version: 2.12-r827
[M::main] CMD: minimap2 -x sr -a --split-prefix prefix index.mmi metagenome.1.trimmed.fastq metagenome.2.trimmed.fastq
[M::main] Real time: 0.141 sec; CPU: 0.141 sec

@lh3 Probably around here we can put a graceful exit with an error message if opt.split_prefix is set and the number of FASTQ is more than one unless MM_F_FRAG_MODE (or MM_F_SR) is set.

hasindu2008 avatar Jun 19 '20 14:06 hasindu2008

Hi all,

I just encountered this same error message as OP. But from this thread I do not see what I need to change.

My use case is on HiFi data (726Gbp of it, in 36 fastq.gz files) and a 15Gb reference genome. So I do use the following command.

minimap2 -ax map-hifi --split-prefix My_prefix -t 32 My_reference.fasta *.fastq.gz | samtools sort -@ 32 - > Alignment.bam
minimap2 --version
2.20-r1061

Any suggestions to still finish the mapping?

RNieuwenhuis avatar Jun 16 '21 10:06 RNieuwenhuis

@RNieuwenhuis

Try running minimap2 on just one fastq.gz file out of those 15. If that works you can concatenate all fastq.gz and run minimap2 as follows:

zcat *.fastq.gz  > reads_all.fastq
minimap2 -ax map-hifi --split-prefix My_prefix -t 32 My_reference.fasta reads_all.fastq | samtools sort -@ 32 - > Alignment.bam

hasindu2008 avatar Jun 16 '21 12:06 hasindu2008

hi,I met some very strange problems.When I ran minimap2, it worked normally at first, but after I successfully processed several files, it had such a problem.

minimap2 -ax sr --split-prefix temp_name -I 20G -t -d ${db} fastp/${filename}_1_fastp.fastq fastp/${filename}_2_fastp.fastq > minimap2/${filename}_minimap2.sam
[M::mm_idx_gen::132.900*0.88] collected minimizers
[M::mm_idx_gen::173.097*0.91] sorted minimizers
[M::main::173.098*0.91] loaded/built the index for 525 target sequence(s)
[M::mm_mapopt_update::173.098*0.91] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 525
[M::mm_idx_stat::175.717*0.91] distinct minimizers: 381665452 (93.30% are singletons); average occurrences: 1.345; average spacing: 6.297
[M::worker_pipeline::193.315*0.93] mapped 980478 sequences
[M::worker_pipeline::209.299*0.94] mapped 980474 sequences
[M::worker_pipeline::225.957*0.95] mapped 980476 sequences
[M::worker_pipeline::241.763*0.96] mapped 980470 sequences
[M::worker_pipeline::257.198*0.97] mapped 980476 sequences
minimap2: format.c:378: write_sam_cigar: Assertion `clip_len[0] < qlen && clip_len[1] < qlen' failed.

Then this problem appeared again.

$ minimap2 -ax sr --split-prefix temp_name -t -d ${db} fastp/${filename}_1_fastp.fastq fastp/${filename}_2_fastp.fastq > minimap2/${filename}_minimap2.sam
Segmentation fault (core dumped)
minimap2 -ax sr ${db} output.fastq.gz > output_minimap2.sam
Segmentation fault (core dumped)
minimap2 -ax sr -K 80M -B 5 -w 5 --split-prefix temp_name -I 20G -t -d ${db} ${filename}_1_fastp.fastq ${filename}_2_fastp.fastq > ${filename}_minimap2.sam
Segmentation fault (core dumped)

These two problems often appear. Even the same reference genome and the same data can run normally one second, and then it will be an error if you try again. These errors appear very randomly, and now they become more and more frequent. The server I use has enough memory. I hope you can help me. Thank you very much.

Maxwell-codesome avatar Jul 06 '23 12:07 Maxwell-codesome