std::runtime_error (core dumped) with Dorado 0.7.2
Issue Report: std::runtime_error (core dumped) with Dorado 0.7.2
The issue:
I have basecalled my data (short read amplicon data) with several versions of Dorado and keep on having the same (or similar) error:
terminate called after throwing an instance of 'std::runtime_error'calling
what(): Empty sequence and qstring provided for read id 5e0ec593-0bf6-4fd2-9fc8-61a20ad2be70
Aborted (core dumped)
It runs for 30-40% or so before the error. the empty string read id (that is is listed in the error) has the same ID when running dorado 0.7.0 and 0.7.2 , but is different when running dorado 0.5.3. (terminate called after throwing an instance of 'std::runtime_error' what(): Empty sequence and qstring provided for read id f76d9321-0dbb-4c6f-88ed-8acae5609c91 Aborted (core dumped)
Steps to reproduce the issue:
My code was as follows: ./dorado basecaller [email protected] /path/to/pod5 --recursive --trim adapters --barcode-sequences /path/to/Sequence_file_1_96_fw.fa --barcode-arrangement /path/to/Dorado_arrangement_file_barcode_01_96.toml --min-qscore 10 > /path/to/Dorado_basecalled.bam
Run environment:
- Dorado version: dorado-0.7.2-linux-x64
- Dorado command: ./dorado basecaller [email protected]
- Operating system: Linux
- Hardware (CPUs, Memory, GPUs): CPU i9, 32cores, Memory: 2TB ssd, GPU: NVIDIA RTX 4080
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
- Source data location (on device or networked drive - NFS, etc.): on device
- Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): 50 gb, short reads (~200bp), R10.4.1, LSK114, costum barcodes
- Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):
Logs
- Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)
Hi @karlijn-doorenspleet This looks like an internal error while handling short reads.
Are you able to find a few reads which demonstrate this issue so that we can look into it?
Kind regards, Rich
Hi Rich,
Thanks!
Yeah I have found the .pod5 file that has been causing an issue:
I'm hitting this as well with Dorado 0.7.3+6e6c45cd on MacOS. I have restarted with --no-trim instead of --trim adapters and will follow up here on whether or not that fixed the issue. Do you need another repro POD5?
[Update] Indeed with --no-trim my basecalling is finally working reliably.
This should be resolved in dorado 0.8.1 onwards.