dorado
dorado copied to clipboard
Multi A100 expected CPU/GPU utilization in correction inference stage
I am trying to utilize a p4.24xl (8xA100 and plenty of CPUs) but when running the step I barely see any CPU/GPU utilization. There's a couple of GB of memory allocated within the GPU and the CPU - but not a lot else going on.
I am using the version 0.9.0+9dc15a8 within the upstream dockerhub container.
$ dorado correct /shared/data.fq \
--from-paf /shared/data_overlaps.paf \
> /shared/data_corrected_reads.fasta
Output
[2024-12-19 10:17:19.679] [info] Running: "correct" "/shared/data.fq" "--from-paf" "/shared/data_overlaps.paf"
[2024-12-19 10:17:19.736] [info] - downloading herro-v1 with httplib
[2024-12-19 10:17:22.201] [info] Using batch size 12 on device cuda:0 in inference thread 0.
[2024-12-19 10:17:22.201] [info] Using batch size 12 on device cuda:0 in inference thread 1.
[2024-12-19 10:17:22.374] [info] Using batch size 12 on device cuda:1 in inference thread 0.
[2024-12-19 10:17:22.374] [info] Using batch size 12 on device cuda:1 in inference thread 1.
[2024-12-19 10:17:22.581] [info] Using batch size 12 on device cuda:2 in inference thread 0.
[2024-12-19 10:17:22.587] [info] Using batch size 12 on device cuda:2 in inference thread 1.
[2024-12-19 10:17:22.824] [info] Using batch size 12 on device cuda:3 in inference thread 0.
[2024-12-19 10:17:22.824] [info] Using batch size 12 on device cuda:3 in inference thread 1.
[2024-12-19 10:17:23.007] [info] Using batch size 12 on device cuda:4 in inference thread 0.
[2024-12-19 10:17:23.007] [info] Using batch size 12 on device cuda:4 in inference thread 1.
[2024-12-19 10:17:23.190] [info] Using batch size 12 on device cuda:5 in inference thread 0.
[2024-12-19 10:17:23.190] [info] Using batch size 12 on device cuda:5 in inference thread 1.
[2024-12-19 10:17:23.379] [info] Using batch size 12 on device cuda:6 in inference thread 0.
[2024-12-19 10:17:23.380] [info] Using batch size 12 on device cuda:6 in inference thread 1.
[2024-12-19 10:17:23.569] [info] Using batch size 12 on device cuda:7 in inference thread 0.
[2024-12-19 10:17:23.570] [info] Using batch size 12 on device cuda:7 in inference thread 1.
How can I improve the utilization of the box...?