dorado
dorado copied to clipboard
Dorado polish index error
I aligned reads to a hifiasm reference (one haplotype only) using dorado aligner (v0.9.1) and output to a directory so that the index and summary files were created.
./dorado-0.9.1-linux-x64/bin/dorado aligner --output-dir ONThap1Align1 --emit-summary --threads 32 BCNH.Q15.ont.hap1.fasta BCNH/bam/BCNH.all.bam
These were transferred to a different cluster (that had available gpus) to run polish.
I have provided the batch command below as well as the contents of the error file. I'm not familiar with the error language - can you tell me how this failed? Thanks.
Geoff
#SBATCH -J dorado1 #Job name
#SBATCH --output=dorado_correct_a100_%j.std
#SBATCH --error=dorado_correct_a100_%j.err
#SBATCH -N 1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=64GB
#SBATCH -b now
#SBATCH -t 48:00:00
#SBATCH -p gpu-a100-mig7
#SBATCH --gres=gpu:a100_1g.10gb:1
date
./dorado-0.9.1-linux-x64/bin/dorado polish -o BCNH.Q15.ont.hap1.polish.fasta ONThap1Align1/BCNH.all.bam BCNH.Q15.ont.hap1.fasta
date
[geoff]$ cat dorado_correct_a100_16786687.err
[2025-01-22 15:58:34.414] [info] Running: "polish" "-o" "BCNH.Q15.ont.hap1.polish.fasta" "ONThap1Align1/BCNH.all.bam" "BCNH.Q15.ont.hap1.fasta"
[2025-01-22 15:58:35.985] [info] Input data does not contain move tables.
[2025-01-22 15:58:35.985] [info] Auto resolving the model.
[2025-01-22 15:58:35.985] [info] Downloading model: '[email protected]_polish_rl'
[2025-01-22 15:58:36.030] [warning] Unknown certs location for current distribution. If you hit download issues, use the envvar `SSL_CERT_FILE` to specify the location manually.
[2025-01-22 15:58:36.311] [info] - downloading [email protected]_polish_rl with httplib
[2025-01-22 15:59:56.458] [error] Failed to download [email protected]_polish_rl: Connection timed out
[2025-01-22 15:59:56.463] [info] - downloading [email protected]_polish_rl with curl
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 17.1M 100 17.1M 0 0 37.9M 0 --:--:-- --:--:-- --:--:-- 37.9M
[2025-01-22 15:59:57.057] [info] Parsing the model config: /project/blue_catfish_genome/geoff/.temp_dorado_model-1baaa4b8f1465bd5/[email protected]_polish_rl/config.toml
[2025-01-22 15:59:57.106] [info] Initializing the devices.
[2025-01-22 16:00:01.072] [info] Loaded model to device 0: cuda:0
[2025-01-22 16:00:01.072] [info] Creating the encoder.
[2025-01-22 16:00:01.077] [info] Creating the decoder.
[2025-01-22 16:00:01.078] [info] Creating 128 BAM handles.
[2025-01-22 16:00:02.776] [info] Threads: 128, inference threads: 1, number of devices: 1
terminate called after throwing an instance of 'std::runtime_error'
what(): Caught exception from merge-samples task: index 98 is out of bounds for dimension 1 with size 98
Exception raised from applySelect at /pytorch/pyold/aten/src/ATen/TensorIndexing.h:245 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fcfb5dc99b7 in /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #1: <unknown function> + 0x39f962a (0x7fcfaed8a62a in /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #2: <unknown function> + 0x45bb7f6 (0x7fcfaf94c7f6 in /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #3: <unknown function> + 0x45bdd03 (0x7fcfaf94ed03 in /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #4: at::Tensor::index_put_(c10::ArrayRef<at::indexing::TensorIndex>, at::Tensor const&) + 0x13a (0x7fcfaf95045a in /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #5: /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado() [0xa1b786]
frame #6: /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado() [0xa1cf33]
frame #7: /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado() [0xa1f34f]
frame #8: /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado() [0xa1fd4c]
frame #9: /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado() [0x9f8af3]
frame #10: /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado() [0x9f957b]
frame #11: /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado() [0x869d1b]
frame #12: <unknown function> + 0xa4a38 (0x7fcfaa344a38 in /usr/lib64/libc.so.6)
frame #13: /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado() [0x9ea1ba]
frame #14: /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado() [0x8770e0]
frame #15: <unknown function> + 0x1196e380 (0x7fcfbccff380 in /project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #16: <unknown function> + 0x9f802 (0x7fcfaa33f802 in /usr/lib64/libc.so.6)
frame #17: <unknown function> + 0x3f450 (0x7fcfaa2df450 in /usr/lib64/libc.so.6)
/var/spool/slurmd/job16786687/slurm_script: line 19: 646539
(core dumped)
/project/blue_catfish_genome/geoff/dorado-0.9.1-linux-x64/bin/dorado polish -o BCNH.Q15.ont.hap1.polish.fasta ONThap1Align1/BCNH.all.bam BCNH.Q15.ont.hap1.fasta
Hi @gwaldbieser , that's a strange error. We're looking into it. Thanks for reporting this!
HI @gwaldbieser , Is this issue reproducible with any of the newer versions of Dorado? The latest one, v1.0.1, was just released: https://github.com/nanoporetech/dorado/releases
Thank you in advance!
Hi, I got a similar error to this one, which also occured when using dorado v1.0.2. Is there already a solution for this?
Thank you in advance!
[2025-07-16 12:59:42.544] [info] Running: "polish" "/nesi/nobackup/brins03581/rootstock_2025/dorado_aligner/1103P_hap1_dorado_aligned_reads.bam" "/nesi/nobackup/brins03581/rootstock_2025/dgenies/purged_dup_filtered_dgenies/1103P_simplex_hap1_purged_dgenies.fa" "--ignore-read-groups"
[2025-07-16 12:59:43.147] [info] Using CUDA devices:
[2025-07-16 12:59:43.147] [info] cuda:0 - NVIDIA H100 NVL
[2025-07-16 12:59:43.187] [info] Input data does not contain move tables.
[2025-07-16 12:59:43.187] [info] Auto resolving the model.
[2025-07-16 12:59:43.187] [info] Downloading model: '[email protected]_polish_rl'
[2025-07-16 12:59:43.209] [warning] Unknown certs location for current distribution. If you hit download issues, use the envvar `SSL_CERT_FILE` to specify the location manually.
[2025-07-16 12:59:43.211] [info] - downloading [email protected]_polish_rl with httplib
[2025-07-16 12:59:43.785] [info] Parsing the model config: /nesi/nobackup/brins03581/rootstock_2025/script/.temp_dorado_model-cbf3d68e2f6e0000/[email protected]_polish_rl/config.toml
[2025-07-16 12:59:43.786] [info] Initializing the devices.
[2025-07-16 12:59:44.094] [info] Loaded model to device 0: cuda:0
[2025-07-16 12:59:44.094] [info] Loaded model to device 0: cuda:0
[2025-07-16 12:59:44.094] [info] Creating 336 encoders.
[2025-07-16 12:59:47.022] [info] Creating the decoder.
[2025-07-16 12:59:47.022] [info] Threads: 336, inference threads: 2, number of devices: 1
[2025-07-16 12:59:59.309] [error] (merge-samples) Caught exception in merge_and_split_bam_regions: 'index 67 is out of bounds for dimension 1 with size 67
Exception raised from applySelect at /builds/machine-learning/torch-builds/pytorch/aten/src/ATen/TensorIndexing.h:257 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0x147e7af0d390 in /opt/nesi/zen3/Dorado/1.0.2/lib/libdorado_torch_lib.so)
frame #1: <unknown function> + 0x21fc089 (0x147e71e40089 in /opt/nesi/zen3/Dorado/1.0.2/lib/libdorado_torch_lib.so)
frame #2: <unknown function> + 0x21fda9a (0x147e71e41a9a in /opt/nesi/zen3/Dorado/1.0.2/lib/libdorado_torch_lib.so)
frame #3: <unknown function> + 0x21ff8dc (0x147e71e438dc in /opt/nesi/zen3/Dorado/1.0.2/lib/libdorado_torch_lib.so)
frame #4: at::Tensor::index_put_(c10::ArrayRef<at::indexing::TensorIndex>, at::Tensor const&) + 0xb0 (0x147e71e3cca0 in /opt/nesi/zen3/Dorado/1.0.2/lib/libdorado_torch_lib.so)
frame #5: dorado() [0x10f36cc]
frame #6: dorado() [0x10f57f6]
frame #7: dorado() [0x10f7009]
frame #8: dorado() [0x10f7377]
frame #9: dorado() [0xf01ee1]
frame #10: dorado() [0xf03142]
frame #11: dorado() [0xd9db1b]
frame #12: <unknown function> + 0x8ee18 (0x147e6f68ee18 in /lib64/libc.so.6)
frame #13: dorado() [0xef8531]
frame #14: dorado() [0xda4a9b]
frame #15: <unknown function> + 0xdcb93 (0x147e6fac1b93 in /opt/nesi/CS400_centos7_bdw/GCCcore/12.3.0/lib64/libstdc++.so.6)
frame #16: <unknown function> + 0x89c02 (0x147e6f689c02 in /lib64/libc.so.6)
frame #17: <unknown function> + 0x10ec40 (0x147e6f70ec40 in /lib64/libc.so.6)
Hi @yusmiatiliau ,
I need to reproduce this issue locally to fix it. Is there a chance you can share some of your data? If you can localize the issue to a region of your draft, could you extract only the alignments for that region and share them?
If that is possible, please do the following:
- Run the command with the following options:
--draft-batchsize 1 -v. This will polish draft sequences one by one and output debug verbose log to stderr. - When it crashes, scroll up to find the nearest lines which begin with
[debug] [run_polishing] region_batch i = .... This is the draft sequence where the crash happened. - If the draft contig is relatively small, can you send that draft contig and all accompanying BAM alignments? (e.g.
samtools view -hb in.bam "contig_7:1-1200000" > contig_7.bam) - Alternatively, if the draft is relatively big, can you rerun Polish over that contig in regions of size 10Mbp with a large overlap? Example:
dorado polish --regions contig_7:1-10000000 ... # -> Doesn't crash
dorado polish --regions contig_7:5000000-15000000 ... # Doesn't crash
dorado polish --regions contig_7:10000000-20000000 ... # Crashes, send this region please
Looking forward to fixing this issue! Thanks for reporting it.
Also, please test the latest release version only (v1.0.2) for this.
Hi @svc-jstone,
Thank you for your response. I will do that and hopefully can share some data to you. Thanks again
Hi @svc-jstone,
Sorry it took a while, but I have run it per region as per your suggestion. I make it quite a small region (per 10K). I uploaded the alignment, contig and error files here: https://github.com/yusmiatiliau/Dorado_polish_data_share
The region 10000-20000 was polished successfully, while the 20000-30000 came back with the same error as previously. These are the command I ran:
dorado polish $DIR/dorado_aligner/1103P_hap1_dorado_aligned_reads.bam $DIR/dgenies/purged_dup_filtered_dgenies/1103P_simplex_hap1_purged_dgenies.fa --ignore-read-groups --draft-batchsize 1 -v --regions h1tg000001l_1:100000-200000 > $DIR/dorado_polish/1103P_hap1_polish_dorado_region_a.fa
dorado polish $DIR/dorado_aligner/1103P_hap1_dorado_aligned_reads.bam $DIR/dgenies/purged_dup_filtered_dgenies/1103P_simplex_hap1_purged_dgenies.fa --ignore-read-groups --draft-batchsize 1 -v --regions h1tg000001l_1:200000-300000 > $DIR/dorado_polish/1103P_hap1_polish_dorado_region_b.fa
Looking forward to hearing from you.
Thanks
Thank you for doing this! I managed to reproduce your issue locally using the data you shared. Will investigate.
Hi @yusmiatiliau,
Thanks to your input data, this should now be fixed on our end and available in the next release.
In the meantime, the root cause for this edge case was that your input BAM file has duplicate identical primary alignments.
If you remove them, polish should work fine.
Here are first few lines of the BAM file as an example:
Hope this helps, Best regards.
Dorado v1.2.0 is now released with a fix for this issue, so I'm closing it as it's considered resolved.