dorado
dorado copied to clipboard
Dorado aligner gives error code -3 when merging temporary files (step needed before running dorado polish)
New issue checks
- [x] I have read the Dorado Documentation.
- [x] I did not find an existing issue.
Dorado version
1.1.1
Dorado subcommand
Aligner
The issue
When aligning bam file from ONT sequencing against draft hifiasm fasta file, dorado gives error code -3:
[2025-10-21 15:10:19.906] [debug] Processed 6850000 reads [2025-10-21 15:10:48.028] [debug] Processed 6900000 reads [2025-10-21 15:11:35.695] [debug] Total reads processed: 6934056 [2025-10-21 15:12:02.666] [info] > finished alignment [2025-10-21 15:12:02.666] [info] > merging temporary BAM files [2025-10-21 15:55:29.082] [error] Error reading record from file hifi_ul/ONT.merged.bam.456.tmp, error code -3 [2025-10-21 15:55:29.434] [error] Merging of temporary files failed. [2025-10-21 15:55:29.434] [info] > Finished in (ms): 13437037 [2025-10-21 15:55:29.434] [info] > Reads written: 5035372 [2025-10-21 15:55:29.434] [info] > total/primary/unmapped 105499354/4995496/39876
After this, the remaining merged.bam is unreadable by samtools for it to create an index.
This is the command I used (the --mm2-opts does not matter, the same error occurs without it).
dorado aligner final_assembly.clean.fasta ONT.merged.bam --output-dir out --mm2-opts "-I 150M -x map-ont" -v
System specifications
Docker: nvidia/cuda:12.4.0-devel-ubuntu22.04 Memory: 2Tb; CPUs: 128 GPU: Nvidia H100 or A80 Cluster: PBS
No solution, just curious: why are you using dorado for alignment (bases are already called). Using minimap2 directly gives you full control and reduces the risk of a failure to just one component, minimap2. My 2p.
No solution, just curious: why are you using dorado for alignment (bases are already called). Using minimap2 directly gives you full control and reduces the risk of a failure to just one component, minimap2. My 2p.
I tried your suggestion, got this error message: [2025-10-22 12:12:04.679] [error] Input BAM file was not aligned using Dorado.
I tried your suggestion, got this error message: [2025-10-22 12:12:04.679] [error] Input BAM file was not aligned using Dorado.
What did you try? What was the exact command? Which software gives you the error with the BAM file? What are you actually trying to achieve? 🤔
I tried your suggestion, got this error message: [2025-10-22 12:12:04.679] [error] Input BAM file was not aligned using Dorado.
What did you try? What was the exact command? Which software gives you the error with the BAM file? What are you actually trying to achieve? 🤔
My goal is to run dorado polish. I tried aligning the reads against the draft genome with minimap2, instead of dorado aligner, to generate the bam file needed by dorado polish. But, Dorado polish does not accept it. It says the input bam file must be generated by dorado aligner.
dorado polish assembly.aligned.minimap2.bam assembly.clean.fasta > assembly.polished.fasta
How did you run the minimap2 aligner?
How did you run the minimap2 aligner?
samtools fastq -T '*' reads.merged.bam | \
minimap2 -ay -x map-ont -t 16 --MD assembly.clean.fasta - | \
samtools sort -@ 8 -o assembly.aligned.minimap2.bam
Hi @mmudado,
Can you try this with the latest version of Dorado, v1.2.0? I noticed you ran it with Dorado v1.1.0.
There was a bug in versions <1.2.0 which happened when realigning a BAM file which was previously aligned to another target.
If a read was not mapped, the output record would match the input one (copied verbatim), including the target ID, instead of being reset to unmapped.
In your case, I suspect that your ONT.merged.bam was already aligned, likely to a reference genome?
Let me know if this works.
Hi @mmudado,
Can you try this with the latest version of Dorado, v1.2.0? I noticed you ran it with Dorado v1.1.0.
There was a bug in versions <1.2.0 which happened when realigning a BAM file which was previously aligned to another target. If a read was not mapped, the output record would match the input one (copied verbatim), including the target ID, instead of being reset to unmapped. In your case, I suspect that your
ONT.merged.bamwas already aligned, likely to a reference genome?Let me know if this works.
Hi @svc-jstone ,
The reads.merged.bam were created using Dorado basecaller with the alignment option for each individual flow cell. Then merged with samtools.
I've tried two approaches with dorado 1.2.0 : i) running dorado-aligner with "--output-dir" flag; and ii) writing the result to a .tmp.bam file in the output with ">" and then running samtools/polish.
The i) approach finished but resulted into two "unknown" directories with two distinct .bam files, but no consolidated .bam file at the end. I didn't know which ones to use:
[2025-10-29 16:48:47.146] [info] Running: "aligner" "hifi_ul.final_assembly.clean.fasta" "reads.merged.bam" "--output-dir" "hifi_ul" "--mm2-opts" "-I 150M -x map-ont" "-v" [2025-10-29 16:48:47.149] [info] num input files: 1 [2025-10-29 16:48:47.149] [debug] > aligner threads 116, writer threads 12 [2025-10-29 16:48:47.149] [info] > loading index hifi_ul.final_assembly.clean.fasta [M::mm_idx_gen::1761756530.7210.00] collected minimizers [M::mm_idx_gen::1761756530.9820.00] sorted minimizers [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2 [M::mm_idx_stat::1761756531.2610.00] distinct minimizers: 13573108 (79.42% are singletons); average occurrences: 2.160; average spacing: 5.328; total length: 156238722 [2025-10-29 16:48:51.261] [debug] Loaded index with 2 target seqs [M::mm_mapopt_update::1761756531.6920.00] mid_occ = 354 [M::mm_idx_gen::1761756536.5090.00] collected minimizers [M::mm_idx_gen::1761756536.8160.00] sorted minimizers [2025-10-29 16:48:56.816] [debug] Loaded next index chunk with 3 target seqs [M::mm_idx_gen::1761756539.8020.00] collected minimizers [M::mm_idx_gen::1761756539.9680.00] sorted minimizers [2025-10-29 16:48:59.967] [debug] Loaded next index chunk with 3 target seqs [M::mm_idx_gen::1761756543.2120.00] collected minimizers [M::mm_idx_gen::1761756543.3640.00] sorted minimizers [2025-10-29 16:49:03.364] [debug] Loaded next index chunk with 4 target seqs [M::mm_idx_gen::1761756545.0570.00] collected minimizers [M::mm_idx_gen::1761756545.1580.00] sorted minimizers [2025-10-29 16:49:05.157] [debug] Loaded next index chunk with 18 target seqs [2025-10-29 16:49:05.192] [info] > starting alignment [2025-10-29 16:49:05.192] [info] processing 'reads.merged.bam' [2025-10-29 16:49:05.192] [debug] > input:'reads.merged.bam' fmt:'BAM version 1 compressed sequence data' aligned:'true' [2025-10-29 16:49:05.206] [debug] Creating output folder: 'hifi_ul/unknown/20250617_2126_0_PBA84238_1946ed5c/bam_pass'. Length:96 [2025-10-29 16:49:05.244] [debug] Creating output folder: 'hifi_ul/unknown/20250623_2025_0_PBA52732_26660309/bam_pass'. Length:96 [2025-10-29 16:50:59.008] [debug] Processed 50000 reads [2025-10-29 16:59:05.539] [debug] Processed 100000 reads [2025-10-29 17:06:06.766] [debug] Processed 150000 reads [2025-10-29 17:08:37.136] [debug] Processed 200000 reads [2025-10-29 17:15:36.463] [debug] Processed 250000 reads [2025-10-29 17:23:37.963] [debug] Processed 300000 reads [2025-10-29 17:26:02.403] [debug] Processed 350000 reads [2025-10-29 17:31:52.655] [debug] Processed 400000 reads [2025-10-29 17:38:20.848] [debug] Processed 450000 reads [2025-10-29 17:45:52.715] [debug] Processed 500000 reads [2025-10-29 17:47:53.209] [debug] Processed 550000 reads [2025-10-29 17:49:30.173] [debug] Processed 600000 reads [2025-10-29 17:57:27.155] [debug] Processed 650000 reads [2025-10-29 18:03:25.838] [debug] Processed 700000 reads [2025-10-29 18:06:59.446] [debug] Processed 750000 reads [2025-10-29 18:16:32.262] [debug] Processed 800000 reads [2025-10-29 18:24:05.144] [debug] Processed 850000 reads [2025-10-29 18:30:29.701] [debug] Processed 900000 reads [2025-10-29 18:39:41.223] [debug] Processed 950000 reads [2025-10-29 18:42:30.175] [debug] Processed 1000000 reads [2025-10-29 18:51:07.174] [debug] Processed 1050000 reads [2025-10-29 18:59:34.988] [debug] Processed 1100000 reads [2025-10-29 19:04:44.363] [debug] Processed 1150000 reads [2025-10-29 19:14:21.439] [debug] Processed 1200000 reads [2025-10-29 19:19:13.947] [debug] Processed 1250000 reads [2025-10-29 19:28:46.688] [debug] Processed 1300000 reads [2025-10-29 19:34:23.584] [debug] Processed 1350000 reads [2025-10-29 19:40:23.904] [debug] Processed 1400000 reads [2025-10-29 19:48:58.766] [debug] Processed 1450000 reads [2025-10-29 19:51:03.426] [debug] Processed 1500000 reads [2025-10-29 19:52:02.747] [debug] Processed 1550000 reads [2025-10-29 19:52:04.631] [debug] Total reads processed: 1551254 [ ] 0% [00:00s<00:00s] Finalising outputs [2025-10-29 21:47:23.729] [info] > finished alignment [2025-10-29 21:47:23.729] [info] > Finished in (ms): 17898564 [2025-10-29 21:47:23.729] [info] > Reads written: 1551134 [2025-10-29 21:47:23.729] [info] > total/primary/unmapped 57315720/1551134/44417 [2025-10-29 21:47:23.729] [debug] > secondary/supplementary 54732283/1032303
The second ii) approach resulted in the same error from dorado 1.1.1. Apparently, the resulting .bam file is corrupted and unreadable by samtools:
[2025-10-29 23:54:43.976] [info] Running: "aligner" "hifi_ul.final_assembly.clean.fasta" "reads.merged.bam" [2025-10-29 23:54:43.979] [info] num input files: 1 [2025-10-29 23:54:43.979] [info] > loading index hifi_ul.final_assembly.clean.fasta [2025-10-29 23:54:53.519] [info] > starting alignment [2025-10-29 23:54:53.519] [info] processing 'reads.merged.bam' [ ] 0% [00:00s<00:00s] Finalising output [2025-10-30 00:27:46.309] [info] > finished alignment [2025-10-30 00:27:46.309] [info] > Finished in (ms): 1972813 [2025-10-30 00:27:46.309] [info] > Reads written: 1551254 [2025-10-30 00:27:46.309] [info] > total/primary/unmapped 3707282/1551254/63529 hifi_ul.aligned_reads.tmp.bam could not be opened for reading. [E::hts_hopen] Failed to open file hifi_ul.aligned_reads.tmp.bam [E::hts_open_format] Failed to open file "hifi_ul.aligned_reads.tmp.bam" : Exec format error samtools sort: can't open "hifi_ul.aligned_reads.tmp.bam": Exec format error [E::hts_hopen] Failed to open file hifi_ul.aligned_reads.tmp.bam [E::hts_open_format] Failed to open file "hifi_ul.aligned_reads.tmp.bam" : Exec format error samtools index: failed to open "hifi_ul.aligned_reads.tmp.bam": Exec format error [2025-10-30 00:28:15.380] [info] Running: "polish" "hifi_ul/hifi_ul.aligned_reads.tmp.bam" "hifi_ul.final_assembly.clean.fasta" [2025-10-30 00:28:15.400] [error] Input draft file hifi_ul/hifi_ul.aligned_reads.tmp.bam does not exist!
Hi @mmudado , thanks for following up!
Can you copy your exact commands for case ii)? (The Dorado Aligner one is obvious from the log, but I'd like to see everything if possible.)
If Aligner is generating corrupt BAM files, we need to figure it out.
Hi @mmudado , thanks for following up! Can you copy your exact commands for case
ii)? (The Dorado Aligner one is obvious from the log, but I'd like to see everything if possible.) If Aligner is generating corrupt BAM files, we need to figure it out.
Hi @svc-jstone ,
I'm running from a docker container, which has "WORKDIR /data" set.
The list of commands I used were:
dorado aligner hifi_ul.final_assembly.clean.fasta reads.merged.bam > data/hifi_ul.aligned_reads.tmp.bam
samtools quickcheck -v hifi_ul.aligned_reads.tmp.bam
samtools sort -@ 64 -m 20G hifi_ul.aligned_reads.tmp.bam > data/hifi_ul.aligned_reads.bam
samtools index -@ 16 hifi_ul.aligned_reads.bam
# this time I tried to run dorado polish with the .tmp.bam directly
dorado polish hifi_ul.aligned_reads.tmp.bam hifi_ul.final_assembly.clean.fasta > data/hifi_ul.polished_assembly.fasta
This time I tried to run dorado polish using the aligned_reads.tmp.bam, output from dorado aligner directly, without sorting and indexing. It did not complete, but gave some different outputs, see if it helps:
[2025-10-31 11:16:29.715] [info] Running: "aligner" "hifi_ul.final_assembly.clean.fasta" "reads.merged.bam"
[2025-10-31 11:16:29.717] [info] num input files: 1
[2025-10-31 11:16:29.717] [info] > loading index hifi_ul.final_assembly.clean.fasta
[2025-10-31 11:16:39.856] [info] > starting alignment
[2025-10-31 11:16:39.856] [info] processing 'reads.merged.bam'
[ ] 0% [00:00s<00:00s] Finalising output
[2025-10-31 11:49:50.581] [info] > finished alignment
[2025-10-31 11:49:50.581] [info] > Finished in (ms): 1990746
[2025-10-31 11:49:50.581] [info] > Reads written: 1551254
[2025-10-31 11:49:50.581] [info] > total/primary/unmapped 3707282/1551254/63529
hifi_ul.aligned_reads.tmp.bam could not be opened for reading.
[E::hts_hopen] Failed to open file hifi_ul.aligned_reads.tmp.bam
[E::hts_open_format] Failed to open file "hifi_ul.aligned_reads.tmp.bam" : Exec format error
samtools sort: can't open "hifi_ul.aligned_reads.tmp.bam": Exec format error
[E::hts_hopen] Failed to open file hifi_ul.aligned_reads.tmp.bam
[E::hts_open_format] Failed to open file "hifi_ul.aligned_reads.tmp.bam" : Exec format error
samtools index: "hifi_ul.aligned_reads.bam" is in a format that cannot be usefully indexed
[2025-10-31 13:27:42.243] [info] Running: "polish" "hifi_ul.aligned_reads.tmp.bam" "hifi_ul.final_assembly.clean.fasta"
[2025-10-31 13:27:42.643] [info] Using CUDA devices:
[2025-10-31 13:27:42.643] [info] cuda:0 - NVIDIA H100 NVL
[E::hts_hopen] Failed to open file hifi_ul.aligned_reads.tmp.bam
[E::hts_open_format] Failed to open file "hifi_ul.aligned_reads.tmp.bam" : Exec format error
[2025-10-31 13:27:42.644] [error] Uncaught signal from:
frame #0: dorado() [0x10ce358]
frame #1: dorado() [0x6063bc]
frame #2:
That's odd.
How big is your hifi_ul.aligned_reads.tmp.bam?
Is it possible that you ran out of disk space so your data ended up truncated?
This time I tried to run dorado polish using the aligned_reads.tmp.bam
Dorado Polish won't work without a sorted and indexed BAM input as it's producing a pileup to construct features.
The i) approach finished but resulted into two "unknown" directories with two distinct .bam files, but no consolidated .bam file at the end. I didn't know which ones to use:
The "unknown" part is strange.
The usual output path looks something like:
<output-dir>/{experiment_id}/{sample_id}/{timestamp}_{position_id}_{flowcell_id}_{run_}/
We suggest starting from scratch on a single flowcell, without aligning in the first instance.
That's odd. How big is your
hifi_ul.aligned_reads.tmp.bam? Is it possible that you ran out of disk space so your data ended up truncated?
63G Oct 31 08:49 hifi_ul.aligned_reads.tmp.bam Impossible to have run out of disk space. Currently, 3.1T available.
This time I tried to run dorado polish using the aligned_reads.tmp.bam
Dorado Polish won't work without a sorted and indexed BAM input as it's producing a pileup to construct features.
Ok, good to know.
The i) approach finished but resulted into two "unknown" directories with two distinct .bam files, but no consolidated .bam file at the end. I didn't know which ones to use:
The "unknown" part is strange.
The usual output path looks something like:
<output-dir>/{experiment_id}/{sample_id}/{timestamp}_{position_id}_{flowcell_id}_{run_}/We suggest starting from scratch on a single flowcell, without aligning in the first instance.
This is how the basecalling was performed:
${DORADO}/bin/dorado basecaller
--modified-bases 5mC_5hmC
--output-dir ${DATADIR}/${Readset}.mods.doradoV1.0.0.SUPv5.2.0_aligned/
--reference ${FA_PATH}
tools/dorado-1.0.0-linux-x64/models/[email protected] \
${POD5_PATH}