bark icon indicating copy to clipboard operation
bark copied to clipboard

GPU/cuda available, but GPU not used?

Open jaggzh opened this issue 1 year ago • 3 comments

I'm in Linux, but I'm getting lots of memory and cpu use, but not seeing any gpu use at all (unless it suddenly uses it for a split second at the end). Inference takes a long time -- like 6 minutes for a 24 word sentence.

>>> print("Torch version:", torch.__version__)
Torch version: 2.2.1+cu121
>>> print("CUDA available:", torch.cuda.is_available())
CUDA available: True
>>> print("Number of GPUs:", torch.cuda.device_count())
Number of GPUs: 1
>>> print("GPU name:", torch.cuda.get_device_name(0))
GPU name: NVIDIA GeForce RTX 3090
$ time ./bark
2024-04-11 21:48:29.666824: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/home/.../python3.11/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
The attention mask and the pad token id were not set. As a consequence, you mayobserve unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.

real    5m57.385s
user    20m17.150s
sys     1m14.751s

jaggzh avatar Apr 12 '24 05:04 jaggzh

Update: Okay, I don't know why GPU is not being selected, but I modified the transformers example code to move the model to GPU. (aistudio google gemini 1.5 is the one that did this work).

Unfortunately, it also required modifying the transformers/...bark code to get all tensors onto the GPU.

Nevertheless, my output is now fast (not sure why but it varies between, say, 17s and 50s):

$ time ./bark
{annoying tensorflow messages}
Loading autoprocessor...
Loading bark model...
{torch.nn.utils.weight_norm deprecation warnings}
Processor()ing...
Generate()ing...
The attention mask and the pad token id were not set. As a consequence, you mayobserve unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:10000 for open-end generation.

real    0m20.381s
user    0m17.566s
sys     0m2.988s

The new code, forcing to GPU:

#!/usr/bin/env python3
from transformers import AutoProcessor, BarkModel
import torch

device = torch.device("cuda:0")
print("Loading autoprocessor...")
processor = AutoProcessor.from_pretrained("suno/bark")
print("Loading bark model...")
model = BarkModel.from_pretrained("suno/bark").to(device)

voice_preset = "v2/en_speaker_0"

print("Processor()ing...")
inputs = processor("Mazda alone is the adorable-most.",
                   voice_preset=voice_preset)
for key, value in inputs.items():
    if torch.is_tensor(value):
        inputs[key] = value.to(device)

print("Generate()ing...")
audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

import scipy

sample_rate = model.generation_config.sample_rate
scipy.io.wavfile.write("bark_out.wav", rate=sample_rate, data=audio_array)

And over in processing_bark.py:

        if voice_preset is not None:
            self._validate_voice_preset_dict(voice_preset, **kwargs)
+            import torch
+            device = torch.device("cuda:0")
+            voice_preset_tensors = {
+                key: torch.from_numpy(value).to(device)
+                for key, value in voice_preset.items()
+            }
+            voice_preset = BatchFeature(data=voice_preset_tensors, tensor_type=return_tensors)
-            voice_preset = BatchFeature(data=voice_preset, tensor_type=return_tensors)

jaggzh avatar Apr 12 '24 06:04 jaggzh

Thank you for this! 🙇 I used this to monkey patch transformers on paperspace so I could try some things with bark and actually use the GPUs. (Disclaimer for future readers: I have no idea what I'm doing so not a good idea to follow in my footsteps.)

njpearman avatar May 01 '24 13:05 njpearman

That changes helped me:

from transformers import AutoProcessor, BarkModel
import torch
import scipy


processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark").to('cuda')  ## to('cuda') added

voice_preset = "v2/en_speaker_0"
text = "Dinosaurs like to eat mushrooms"
inputs = processor(text, voice_preset=voice_preset)

inputs=inputs.to('cuda') ## to('cuda') added

audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

sample_rate = model.generation_config.sample_rate
scipy.io.wavfile.write("bark_test_out.wav", rate=sample_rate, data=audio_array)

saga111a avatar May 23 '24 19:05 saga111a