bonito
bonito copied to clipboard
Multiple GPU usage
Hello, we have been trying to use multiple GPUs to basecall and have discovered some interesting behavior. When specifying --device cuda:0
it runs on our CUDA0 device as intended. When we specify --device cuda:1
it leaks on to both GPUs as seen in the image below. Is this an intended behavior? The nomenclature with guppy for multiple GPUs is --device "cuda:0 cuda:1"
. Is bonito basecalling intended for multi-GPU usage?

Hey @wharvey31
Bonito doesn't currently support multi-gpu calling and the behaviour you are seeing is the default for frameworks like PyTorch and TensorFlow. You can manage which devices are available to the process with CUDA_VISIBLE_DEVICES
.
$ CUDA_VISIBLE_DEVICES=0 bonito basecaller dna_r9.4.1 data_set_1 > calls_1.fasta
$ CUDA_VISIBLE_DEVICES=1 bonito basecaller dna_r9.4.1 data_set_2 > calls_2.fasta
Hi there, I too was trying to run bonito on a multi-GPU machine. I split a folder of fast5s into subfolders once I started to see numbers suggesting it would take roughly 20 days to basecall my flowcell (4 reads/s). Then I tried a simple run with these commands:
CUDA_VISIBLE_DEVICE=0 /opt/bonito-0.3.2/bin/bonito basecaller dna_r9.4.1 1 > 1.fastq
CUDA_VISIBLE_DEVICE=1 /opt/bonito-0.3.2/bin/bonito basecaller dna_r9.4.1 2 > 2.fastq
...
After some reads are processed I get the following error:
CUDA_VISIBLE_DEVICE=1 /opt/bonito-0.3.2/bin/bonito basecaller dna_r9.4.1 2 > 2.fastq
> loading model
> calling: 188 reads [02:58, 2.01 reads/s]Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/multiprocessing.py", line 202, in run
for (k, v) in self.iterator:
File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/crf/basecall.py", line 105, in <genexpr>
stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches))
File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/util.py", line 209, in <genexpr>
for k, group in groupby(batches, itemgetter(0))
File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/util.py", line 208, in <listcomp>
(k, concat([v for (k, v) in group], dim))
File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/util.py", line 203, in <genexpr>
(k, select_range(v, start, end, dim))
File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/crf/basecall.py", line 103, in <genexpr>
for read, batch in thread_iter(batchify(chunks, batchsize=batchsize))
File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/crf/basecall.py", line 38, in compute_scores
betas = model.seqdist.backward_scores(scores.to(torch.float32))
RuntimeError: CUDA out of memory. Tried to allocate 450.00 MiB (GPU 0; 10.76 GiB total capacity; 2.61 GiB already allocated; 66.12 MiB free; 3.74 GiB reserved in total by PyTorch)
Is the above approach still recommended for multi-GPU basecalling? I see a --devices
parameter that I'm not using - perhaps I need to provide something there?
Despite using different CUDA_VISIBLE_DEVICE values, I'm getting different executions pushed to the same GPU (explains @RichardCorbett's issue).
Using CUDA_VISIBLE_DEVICES=0 /opt/bonito-0.3.2/bin/bonito basecaller dna_r9.4.1 1 > 1.fastq
fixed it (note the plural variable name CUDA_VISIBLE_DEVICES)
Sorry @noncodo @RichardCorbett it should be CUDA_VISIBLE_DEVICES
.
Thanks Martin (@noncodo) saved me a headache, I owe you a beer.
Hi~ I run bonito with multiple GPUs by --device cuda:x. For example:
bonito basecaller --device cuda:0 modelDirectory fast5Directory0
bonito basecaller --device cuda:1 modelDirectory fast5Directory1
bonito basecaller --device cuda:2 modelDirectory fast5Directory2
bonito basecaller --device cuda:3 modelDirectory fast5Directory3
But the first process that runs on cuda:0, seems get blocked.
For performant multi-gpu inference see https://github.com/nanoporetech/dorado https://github.com/nanoporetech/dorado