dorado icon indicating copy to clipboard operation
dorado copied to clipboard

Significant slowdown when adding mA modification model

Open simondrue opened this issue 1 year ago • 3 comments

Hi Nanopore team,

I noticed a significant (~10x) slow down of the Dorado basecaller when adding the 6mA modification model. This is notable since adding the 5mCG_5hmCG modification model have almost no impact on basecalling speed.

Is this expected behavior? If so, are there any plans to optimize the speed of the 6mA model?

Results from a small benchmark of basecalling speeds:

  • 5mCG_5hmCG@v1 + 6mA@v2: 2.875147e+06
  • 5mCG_5hmCG@v1: 1.816510e+07
  • 6mA@v2: 1.598966e+06
  • 6mA@v1: 1.628905e+06
  • No mods: 1.869901e+07

Thanks for a great tool. Looking forward to see where the project is going 🚀

Run environment:

  • Dorado version: v0.6.0
  • Dorado command: basecaller
  • Operating system: Linux
  • Hardware:
    • NVIDIA V100 16GB
    • Intel/“Skylake” Gold 6140 CPU @ 2.30GHz, 18 cores/CPU
    • Nvidia V100 16Gb GPU
    • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance):

Logs

5mCG_5hmCG@v1 + 6mA@v2

[2024-04-10 13:40:56.130] [info] Running: "basecaller" "--no-trim" "--modified-bases-models" "/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]_5mCG_5hmCG@v1,/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]_6mA@v2" "/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]" "/dev/shm/pod5s/"
[2024-04-10 13:40:56.292] [info] > Creating basecall pipeline
[2024-04-10 13:41:42.105] [info] cuda:0 using chunk size 9996, batch size 2304
[2024-04-10 13:41:43.233] [info] cuda:0 using chunk size 4998, batch size 3328
[2024-04-10 13:46:59.681] [info] > Simplex reads basecalled: 19997
[2024-04-10 13:46:59.681] [info] > Simplex reads filtered: 3
[2024-04-10 13:46:59.681] [info] > Basecalled @ Samples/s: 2.875147e+06
[2024-04-10 13:46:59.694] [info] > Finished

5mCG_5hmCG@v1 only

[2024-04-10 13:22:37.532] [info] Running: "basecaller" "--no-trim" "--modified-bases-models" "/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]_5mCG_5hmCG@v1" "/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]" "/dev/shm/pod5s/"
[2024-04-10 13:22:37.595] [info] > Creating basecall pipeline
[2024-04-10 13:23:07.830] [info] cuda:0 using chunk size 9996, batch size 2304
[2024-04-10 13:23:09.229] [info] cuda:0 using chunk size 4998, batch size 3328
[2024-04-10 13:24:00.319] [info] > Simplex reads basecalled: 19997
[2024-04-10 13:24:00.319] [info] > Simplex reads filtered: 3
[2024-04-10 13:24:00.319] [info] > Basecalled @ Samples/s: 1.816510e+07
[2024-04-10 13:24:00.326] [info] > Finished

6mA@v2 only

[2024-04-10 13:29:43.608] [info] Running: "basecaller" "--no-trim" "--modified-bases-models" "/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]_6mA@v2" "/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]" "/dev/shm/pod5s/"
[2024-04-10 13:29:43.651] [info] > Creating basecall pipeline
[2024-04-10 13:30:13.930] [info] cuda:0 using chunk size 9996, batch size 2304
[2024-04-10 13:30:14.863] [info] cuda:0 using chunk size 4998, batch size 3328
[2024-04-10 13:39:42.926] [info] > Simplex reads basecalled: 19997
[2024-04-10 13:39:42.926] [info] > Simplex reads filtered: 3
[2024-04-10 13:39:42.926] [info] > Basecalled @ Samples/s: 1.598966e+06
[2024-04-10 13:39:42.932] [info] > Finished

6mA@v1 only

[2024-04-10 13:58:07.035] [info] Running: "basecaller" "--no-trim" "--modified-bases-models" "/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]_6mA@v1" "/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]" "/dev/shm/pod5s/"
[2024-04-10 13:58:07.096] [info] > Creating basecall pipeline
[2024-04-10 13:58:42.809] [info] cuda:0 using chunk size 9996, batch size 2304
[2024-04-10 13:58:43.992] [info] cuda:0 using chunk size 4998, batch size 3328
[2024-04-10 14:08:01.642] [info] > Simplex reads basecalled: 19997
[2024-04-10 14:08:01.642] [info] > Simplex reads filtered: 3
[2024-04-10 14:08:01.642] [info] > Basecalled @ Samples/s: 1.628905e+06
[2024-04-10 14:08:01.649] [info] > Finished

No mods

[2024-04-10 10:03:54.462] [info] Running: "basecaller" "--no-trim" "/faststorage/project/MomaReference/BACKUP/nanopore/models/dorado_models/[email protected]" "/dev/shm/pod5s/"
[2024-04-10 10:03:54.573] [info] > Creating basecall pipeline
[2024-04-10 10:04:28.276] [info] cuda:0 using chunk size 9996, batch size 2304
[2024-04-10 10:04:29.366] [info] cuda:0 using chunk size 4998, batch size 3264
[2024-04-10 10:05:19.109] [info] > Simplex reads basecalled: 19997
[2024-04-10 10:05:19.112] [info] > Simplex reads filtered: 3
[2024-04-10 10:05:19.115] [info] > Basecalled @ Samples/s: 1.869901e+07
[2024-04-10 10:05:19.121] [info] > Finished

simondrue avatar Apr 10 '24 11:04 simondrue

Hi @simondrue - your benchmark showing that 5mCG_5hmCG@v1 + 6mA@v2 is faster than 6mA@v2 only is a bit surprising - could you repeat this a few times and verify that your benchmarks are not noisy?

vellamike avatar Apr 10 '24 14:04 vellamike

I created a small dataset with four pod5s that are about 2GB each and run on 4xA100 with 4.3.0 sup model.

5mCG_5hmCG only: 3m49.244s 5mC_5hmC only: 6m21.693s 6mA only: 9m33.237s 5mCG_5hmCG + 6mA: 9m49.589s 5mC_5hmC + 6mA: 11m13,113s

My times seem quite normal. Is this within expectation?

ymcki avatar Apr 11 '24 02:04 ymcki

Hi,

I expanded my benchmark and used Dorado v0.7.0 with the new v5 models for both HAC and SUP, all available modifications (one at a time - no combinations) and 5 replicates with --max-reads 150000. The system is the same as stated above and the data is from a cfDNA sample.

I still see the significant slowdown for 6mA model, even compared to the other all context models. Just to verify that there is not an enriched amount of A the composition of the sample is:

  • A: 10.143.093 bases (27.38%)
  • T: 11.088.922 bases (29.94%)
  • C: 7.874.313 bases (21.26%)
  • G: 7.937.969 bases (21.42%)

image image

The data behind the plots: speed_data.csv

Sorry for the late reply

/Simon

simondrue avatar Jun 04 '24 07:06 simondrue

This performance is inline with expectations as the 6mA model is all context and a significantly larger (more computationally intensive) model - this is why the performance is significantly impacted. The CG context models are smaller (more efficient) and mod call at significantly fewer positions hence the better performance.

We're always working on improving basecalling and mod basecalling performance and we'll release updates when they're available.

Kind regards, Rich

HalfPhoton avatar Sep 17 '24 10:09 HalfPhoton

Thanks for your explanation :)

Closing the issue

simondrue avatar Sep 17 '24 12:09 simondrue