dorado icon indicating copy to clipboard operation
dorado copied to clipboard

std::runtime_error (core dumped) with Dorado 0.7.2

Open karlijn-doorenspleet opened this issue 1 year ago • 3 comments

Issue Report: std::runtime_error (core dumped) with Dorado 0.7.2

The issue:

I have basecalled my data (short read amplicon data) with several versions of Dorado and keep on having the same (or similar) error:

terminate called after throwing an instance of 'std::runtime_error'calling
what(): Empty sequence and qstring provided for read id 5e0ec593-0bf6-4fd2-9fc8-61a20ad2be70 Aborted (core dumped)

It runs for 30-40% or so before the error. the empty string read id (that is is listed in the error) has the same ID when running dorado 0.7.0 and 0.7.2 , but is different when running dorado 0.5.3. (terminate called after throwing an instance of 'std::runtime_error' what(): Empty sequence and qstring provided for read id f76d9321-0dbb-4c6f-88ed-8acae5609c91 Aborted (core dumped)

Steps to reproduce the issue:

My code was as follows: ./dorado basecaller [email protected] /path/to/pod5 --recursive --trim adapters --barcode-sequences /path/to/Sequence_file_1_96_fw.fa --barcode-arrangement /path/to/Dorado_arrangement_file_barcode_01_96.toml --min-qscore 10 > /path/to/Dorado_basecalled.bam

Run environment:

  • Dorado version: dorado-0.7.2-linux-x64
  • Dorado command: ./dorado basecaller [email protected]
  • Operating system: Linux
  • Hardware (CPUs, Memory, GPUs): CPU i9, 32cores, Memory: 2TB ssd, GPU: NVIDIA RTX 4080
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
  • Source data location (on device or networked drive - NFS, etc.): on device
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): 50 gb, short reads (~200bp), R10.4.1, LSK114, costum barcodes
  • Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

  • Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

karlijn-doorenspleet avatar Jul 03 '24 17:07 karlijn-doorenspleet

Hi @karlijn-doorenspleet This looks like an internal error while handling short reads.

Are you able to find a few reads which demonstrate this issue so that we can look into it?

Kind regards, Rich

HalfPhoton avatar Jul 04 '24 08:07 HalfPhoton

Hi Rich,

Thanks!

Yeah I have found the .pod5 file that has been causing an issue:

FAY62814_pass_9d451205_cda359de_792.pod5.zip

karlijn-doorenspleet avatar Jul 04 '24 12:07 karlijn-doorenspleet

I'm hitting this as well with Dorado 0.7.3+6e6c45cd on MacOS. I have restarted with --no-trim instead of --trim adapters and will follow up here on whether or not that fixed the issue. Do you need another repro POD5? [Update] Indeed with --no-trim my basecalling is finally working reliably.

RByers avatar Sep 05 '24 12:09 RByers

This should be resolved in dorado 0.8.1 onwards.

malton-ont avatar Dec 11 '24 15:12 malton-ont