dorado Slow basecaller

While running dorado v0.8 I got this warning ""Unable to find chunk benchmarks for GPU "Quadro RTX 4000 with Max-Q Design", model [email protected] and chunk size 1728. Full benchmarking will run for this device, which may take some time.

The program is running very slow. Please help.

Run environment:

Dorado version: 0.8.0
Dorado command:./dorado basecaller [email protected] "/media/leek/Dimen SSD/MP_Ind_LibA1/no_sample/20231201_0053_MC-110688_FBA12784_6baacc8b/pod5_pass" -x "cuda:all" -r --emit-fastq > "/media/leek/Dimen SSD/MP_Ind_LibA1/MP_Ind_LibA1.fastq"
Operating system: Ubuntu 20
Hardware (CPUs, Memory, GPUs):Intel® Core™ i9-10980HK CPU @ 2.40GHz × 16; 125.5 GiB; NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] /
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
Source data location (on device or networked drive - NFS, etc.): on device
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

Sep 27 '24 17:09 anilchauhanhp9

Hi @anilchauhanhp9, Can you add some logging and more information about the commands you've used? Also, a chunk size of 1728 is unusual.

What are you comparing the performance to? Do you have existing benchmarks and see a regression in 0.8.0?

Best regards, Rich

Sep 28 '24 15:09 HalfPhoton

@anilchauhanhp9,

The Unable to find benchmarks warning is just a warning - dorado 0.8.0 now includes some pre-calculated benchmarks for a limited range of hardware which allows us to select an optimal batch size without running a time-consuming set of tests on start-up. Since your GPU is not in the known set of hardware, we have to run these tests - previous versions of dorado also ran these, but did so silently as there was no pre-calculated data available anyway.

Note that the sup model began using the v5 transformer architecture as of dorado 0.7.0. This is shown to be more accurate than the v4.3 models, at the expense of a longer runtime. We are actively working on improvements to this. If runtime is more important to you, you can manually select the earlier models.

n.b. The 1728 chunk size used for these tests is actually expected, it was adjusted from the 1800 used in previous versions due to the introduction of the transformer model architecture in dorado 0.7.0, which requires slightly different chunk sizes.

Sep 30 '24 08:09 malton-ont

Thank you for the clarification.

Sep 30 '24 14:09 anilchauhanhp9