bonito icon indicating copy to clipboard operation
bonito copied to clipboard

Multiple GPU usage

Open wharvey31 opened this issue 4 years ago • 6 comments

Hello, we have been trying to use multiple GPUs to basecall and have discovered some interesting behavior. When specifying --device cuda:0 it runs on our CUDA0 device as intended. When we specify --device cuda:1 it leaks on to both GPUs as seen in the image below. Is this an intended behavior? The nomenclature with guppy for multiple GPUs is --device "cuda:0 cuda:1". Is bonito basecalling intended for multi-GPU usage?

Screen Shot 2020-11-03 at 3 19 46 PM

wharvey31 avatar Nov 03 '20 23:11 wharvey31

Hey @wharvey31

Bonito doesn't currently support multi-gpu calling and the behaviour you are seeing is the default for frameworks like PyTorch and TensorFlow. You can manage which devices are available to the process with CUDA_VISIBLE_DEVICES.


$ CUDA_VISIBLE_DEVICES=0 bonito basecaller dna_r9.4.1 data_set_1 > calls_1.fasta
$ CUDA_VISIBLE_DEVICES=1 bonito basecaller dna_r9.4.1 data_set_2 > calls_2.fasta

iiSeymour avatar Nov 04 '20 12:11 iiSeymour

Hi there, I too was trying to run bonito on a multi-GPU machine. I split a folder of fast5s into subfolders once I started to see numbers suggesting it would take roughly 20 days to basecall my flowcell (4 reads/s). Then I tried a simple run with these commands:

CUDA_VISIBLE_DEVICE=0 /opt/bonito-0.3.2/bin/bonito basecaller dna_r9.4.1  1 > 1.fastq
CUDA_VISIBLE_DEVICE=1 /opt/bonito-0.3.2/bin/bonito basecaller dna_r9.4.1  2 > 2.fastq
...

After some reads are processed I get the following error:

CUDA_VISIBLE_DEVICE=1 /opt/bonito-0.3.2/bin/bonito basecaller dna_r9.4.1  2 > 2.fastq
> loading model
> calling: 188 reads [02:58,  2.01 reads/s]Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/multiprocessing.py", line 202, in run
    for (k, v) in self.iterator:
  File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/crf/basecall.py", line 105, in <genexpr>
    stitched = ((read, _stitch(x)) for (read, x) in unbatchify(batches))
  File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/util.py", line 209, in <genexpr>
    for k, group in groupby(batches, itemgetter(0))
  File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/util.py", line 208, in <listcomp>
    (k, concat([v for (k, v) in group], dim))
  File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/util.py", line 203, in <genexpr>
    (k, select_range(v, start, end, dim))
  File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/crf/basecall.py", line 103, in <genexpr>
    for read, batch in thread_iter(batchify(chunks, batchsize=batchsize))
  File "/opt/bonito-0.3.2/lib/python3.6/site-packages/bonito/crf/basecall.py", line 38, in compute_scores
    betas = model.seqdist.backward_scores(scores.to(torch.float32))
RuntimeError: CUDA out of memory. Tried to allocate 450.00 MiB (GPU 0; 10.76 GiB total capacity; 2.61 GiB already allocated; 66.12 MiB free; 3.74 GiB reserved in total by PyTorch)

Is the above approach still recommended for multi-GPU basecalling? I see a --devices parameter that I'm not using - perhaps I need to provide something there?

RichardCorbett avatar Dec 09 '20 23:12 RichardCorbett

Despite using different CUDA_VISIBLE_DEVICE values, I'm getting different executions pushed to the same GPU (explains @RichardCorbett's issue). Using CUDA_VISIBLE_DEVICES=0 /opt/bonito-0.3.2/bin/bonito basecaller dna_r9.4.1 1 > 1.fastq fixed it (note the plural variable name CUDA_VISIBLE_DEVICES)

noncodo avatar Jan 04 '21 20:01 noncodo

Sorry @noncodo @RichardCorbett it should be CUDA_VISIBLE_DEVICES.

iiSeymour avatar Jan 04 '21 20:01 iiSeymour

Thanks Martin (@noncodo) saved me a headache, I owe you a beer.

Psy-Fer avatar Mar 01 '21 11:03 Psy-Fer

Hi~ I run bonito with multiple GPUs by --device cuda:x. For example:

bonito basecaller --device cuda:0 modelDirectory fast5Directory0
bonito basecaller --device cuda:1 modelDirectory fast5Directory1
bonito basecaller --device cuda:2 modelDirectory fast5Directory2
bonito basecaller --device cuda:3 modelDirectory fast5Directory3

But the first process that runs on cuda:0, seems get blocked. image

ttbond avatar Jun 28 '21 01:06 ttbond

For performant multi-gpu inference see https://github.com/nanoporetech/dorado https://github.com/nanoporetech/dorado

iiSeymour avatar May 31 '23 13:05 iiSeymour