dorado
dorado copied to clipboard
Delay in basecaller with mod in 0.7.3
Issue Report
Please describe the issue:
Mod calling is substantially delayed in dorado 0.7.3 model v5.0.0
vs dorado 0.5.2 model v4.3.0
:
Dorado Version | Model | Samples/s |
---|---|---|
0.5.2-linux-x64 | [email protected] | 5.11e+07 |
0.5.2-linux-x64 | [email protected],5mCG_5hmCG | 4.99e+07 |
0.7.3-linux-x64 | [email protected] | 2.25e+07 |
0.7.3-linux-x64 | [email protected],5mCG_5hmCG | 0.27e+07 |
I would like to update to 0.7.3 but cannot afford the delay when mod calling is added. Any assistance would be much appreciated thank you!
See below full benchmark.
Steps to reproduce the issue:
marchi@DESKTOP-4090:~/projects/dorado_test$ ~/dorado-0.5.2-linux-x64/bin/dorado basecaller hac pod5s/ > calls.bam --no-trim
[2024-08-05 15:10:27.995] [info] - downloading [email protected] with httplib
[2024-08-05 15:10:30.197] [info] > Creating basecall pipeline
[2024-08-05 15:13:45.871] [info] - set batch size for cuda:0 to 3328
[2024-08-05 15:15:17.102] [info] > Simplex reads basecalled: 78163
[2024-08-05 15:15:17.102] [info] > Simplex reads filtered: 4
[2024-08-05 15:15:17.102] [info] > Basecalled @ Samples/s: 5.108344e+07
[2024-08-05 15:15:17.129] [info] > Finished
marchi@DESKTOP-4090:~/projects/dorado_test$ ~/dorado-0.5.2-linux-x64/bin/dorado basecaller hac,5mCG_5hmCG pod5s/ > calls.bam --no-trim
[2024-08-05 15:16:02.658] [info] - downloading [email protected] with httplib
[2024-08-05 15:16:04.781] [info] - downloading [email protected]_5mCG_5hmCG@v1 with httplib
[2024-08-05 15:16:07.072] [info] > Creating basecall pipeline
[2024-08-05 15:19:36.366] [info] - set batch size for cuda:0 to 3328
[2024-08-05 15:21:09.727] [info] > Simplex reads basecalled: 78163
[2024-08-05 15:21:09.727] [info] > Simplex reads filtered: 4
[2024-08-05 15:21:09.727] [info] > Basecalled @ Samples/s: 4.991838e+07
[2024-08-05 15:21:09.781] [info] > Finished
marchi@DESKTOP-4090:~/projects/dorado_test$~/dorado-0.7.3-linux-x64/bin/dorado basecaller hac pod5s/ > calls.bam --no-trim
[2024-08-05 15:22:33.096] [info] Running: "basecaller" "hac" "pod5s/" "--no-trim"
[2024-08-05 15:22:33.111] [info] - downloading [email protected] with httplib
[2024-08-05 15:22:35.008] [info] Normalised: chunksize 10000 -> 9996
[2024-08-05 15:22:35.008] [info] Normalised: overlap 500 -> 498
[2024-08-05 15:22:35.008] [info] > Creating basecall pipeline
[2024-08-05 15:23:03.214] [info] cuda:0 using chunk size 9996, batch size 3328
[2024-08-05 15:23:04.212] [info] cuda:0 using chunk size 4998, batch size 5376
[2024-08-05 15:27:03.930] [info] > Simplex reads basecalled: 78161
[2024-08-05 15:27:03.930] [info] > Simplex reads filtered: 6
[2024-08-05 15:27:03.930] [info] > Basecalled @ Samples/s: 2.248256e+07
[2024-08-05 15:27:03.939] [info] > Finished
marchi@DESKTOP-4090:~/projects/dorado_test$ ~/dorado-0.7.3-linux-x64/bin/dorado basecaller hac,5mCG_5hmCG pod5s/ > calls.bam --no-trim
[2024-08-05 15:28:28.376] [info] Running: "basecaller" "hac,5mCG_5hmCG" "pod5s/" "--no-trim"
[2024-08-05 15:28:28.389] [info] - downloading [email protected] with httplib
[2024-08-05 15:28:31.690] [info] - downloading [email protected]_5mCG_5hmCG@v1 with httplib
[2024-08-05 15:28:34.403] [info] Normalised: chunksize 10000 -> 9996
[2024-08-05 15:28:34.403] [info] Normalised: overlap 500 -> 498
[2024-08-05 15:28:34.403] [info] > Creating basecall pipeline
[2024-08-05 15:29:00.902] [info] cuda:0 using chunk size 9996, batch size 3328
[2024-08-05 15:29:01.887] [info] cuda:0 using chunk size 4998, batch size 5376
[2024-08-05 15:58:04.334] [info] > Simplex reads basecalled: 78161
[2024-08-05 15:58:04.334] [info] > Simplex reads filtered: 6
[2024-08-05 15:58:04.334] [info] > Basecalled @ Samples/s: 2.724463e+06
[2024-08-05 15:58:04.345] [info] > Finished
Run environment:
- Dorado version: 0.7.3 and 0.5.2
- Dorado command: written above
- Operating system: WSL2 ubuntu
- Hardware (CPUs, Memory, GPUs): i9, 64GB, RTX4090 24GB
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5
- Source data location (on device or networked drive - NFS, etc.): local tower