Too few arguments for '--mm2-opts'
Issue Report
Please describe the issue:
When I running dorado basecaller , I encountered the following error.
Steps to reproduce the issue:
My dorado basecaller code:
${DoradoDir}/dorado basecaller -v sup,inosine_m6A,pseU,m5C --min-qscore 10 --verbose --emit-moves -b 64 --chunksize 9216 --mm2-opts "-k 15 -w 10 --secondary=no" --estimate-poly-a --reference ${indexDir}/gencode.v43.normal.transcripts.fa -x cuda:0 ${pod5path}/HEK293T_1.pod5 --resume-from ${ModDir}/PAQ17395_1_sup.pass.m6A_pseU_m5C_inosine.mod.pass.bam > ${ModDir}/PAQ17395_1_sup.pass.m6A_pseU_m5C_inosine.mod.pass.complete.bam
Run environment:
- Dorado version: V0.8.0
- Operating system: Linux
- Hardware (CPUs, Memory, GPUs): GTX 3090Ti
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
- Source data location (on device or networked drive - NFS, etc.): on device
- Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): SQK-RNA004
Logs
[2024-09-29 10:34:28.139] [info] - BAM format does not support U, so RNA output files will include T instead of U for all file types.
[2024-09-29 10:34:32.453] [debug] TxEncoderStack: use_koi_tiled false.
[2024-09-29 10:34:33.304] [debug] cuda:0 memory available: 16.97GB
[2024-09-29 10:34:33.304] [debug] cuda:0 memory limit 12.65GB
[2024-09-29 10:34:33.304] [debug] cuda:0 maximum safe estimated batch size at chunk size 9216 is 192
[2024-09-29 10:34:33.304] [debug] cuda:0 maximum safe estimated batch size at chunk size 4608 is 416
[2024-09-29 10:34:33.304] [info] cuda:0 using chunk size 9216, batch size 64
[2024-09-29 10:34:33.304] [debug] cuda:0 Model memory 3.43GB
[2024-09-29 10:34:33.304] [debug] cuda:0 Decode memory 0.42GB
[2024-09-29 10:34:33.510] [info] cuda:0 using chunk size 4608, batch size 64
[2024-09-29 10:34:33.510] [debug] cuda:0 Model memory 1.71GB
[2024-09-29 10:34:33.510] [debug] cuda:0 Decode memory 0.21GB
[2024-09-29 10:34:59.199] [debug] Loaded index with 252913 target seqs
[2024-09-29 10:34:59.630] [debug] BasecallerNode chunk size 9216
[2024-09-29 10:34:59.630] [debug] BasecallerNode chunk size 4608
[2024-09-29 10:35:00.269] [info] > Inspecting resume file...
[2024-09-29 10:35:00.603] [error] finalise() not called on a HtsFile.
[2024-09-29 10:35:00.651] [error] Too few arguments for '--mm2-opts'.
[2024-09-29 10:35:00.651] [trace] Deleting temporary model path: /public2/hjliang/ONT_data/script/.temp_dorado_model-4dca3eb0a456c074
[2024-09-29 10:35:00.673] [trace] Deleting temporary model path: /public2/hjliang/ONT_data/script/.temp_dorado_model-5e2608ef88664109
Best wishes, Kirito
Addendum: If I don't add the --resume-from parameter, dorado basecaller will run normally.
Dorado will not output the '--mm2-opts' error.
Can you share the bam header in the --resume-from ${ModDir}/PAQ17395_1_sup.pass.m6A_pseU_m5C_inosine.mod.pass.bam file?
Dorado stores the original command used here and the error may be there.
Best regards, Rich
Hi, @HalfPhoton I put the header in here.
@HD VN:1.6 SO:unknown
@PG ID:basecaller PN:dorado VN:0.8.0+acec121
CL:dorado basecaller -v sup,inosine_m6A,pseU,m5C --min-qscore 10 --verbose --emit-moves
-b 64 --chunksize 9216 --mm2-opts -k 15 -w 10 --secondary=no --estimate-poly-a --reference /home/hjliang/genomes/hg38/gencode.v43.normal.transcripts.f
a -x cuda:0 /public3/guowb/RNA004/WT/pod5/HEK293T_1.pod5
DS:gpu:NVIDIA GeForce RTX 3090
@PG ID:samtools PN:samtools PP:basecaller VN:1.17 CL:samtools view -H PAQ17395_1_sup.pass.m6A_pseU_m5C_inosine.mod.pass.bam
@RG ID:68b19cb40ce8cb6e3e194899954bdb9e5586ceba_rna004_130bps_sup@v5.1.0 PU:PAQ17395 PM:PC48A044 DT:2023-11-17T08:27:39.852+00:00
PL:ONT DS:[email protected] [email protected]_inosine_m6A@v1,[email protected]_pseU@v1,rna0
[email protected]_m5C@v1 runid=68b19cb40ce8cb6e3e194899954bdb9e5586ceba LB:20231117-NPL2300672-P6-PAQ17395-hac SM:20231117-NPL2300672-P6-PAQ1
7395-hac
@SQ SN:ENST00000456328.2|ENSG00000290825.1 LN:1657
@SQ SN:ENST00000450305.2|ENSG00000223972.6 LN:632
...
I experience the same bug. The basecalling stopped after a GPU out-of-memory error. Trying to resume from the incomplete bam gives the same error. The problem was indeed in the bam file header where the argument of --mm2-opts is not quoted (even if it was quoted in the original command).
I've solved this by exporting the header (samtools view -H > header.sam), quoting the --mm2-opts argument in a text editor, then replaced the header of the incomplete bam (samtools reheader header.sam $incomplete.bam > $incomplete_corrected.bam). Hope this solves the issue for you as well @kir1to455.
Yes, @tramelliwe you've identified the issue. Thank you.
@kir1to455m, we can see that the sam header (which dorado re-uses arguments from when using resume) --mm2-opts -k 15 -w 10 --secondary=no is indeed missing the required quotation marks.
This has been fixed