bark icon indicating copy to clipboard operation
bark copied to clipboard

Missing Audio

Open To1p5 opened this issue 1 year ago • 3 comments

I am trying to set up bark locally currently I am at the point where audio files are getting generated but they are not playable and my media player says the files are corrupted. I also seem to be having a problem where the system isn't recognizing my gpu when I have this code 'audio_array + audio_array.cpu().numpy().squeeze()' it will create the empty audio file but if i do this 'audio_array + audio_array.gpu().numpy().squeeze()' It will give me this error 'File "C:\Users\fmsal\OneDrive\Documents\Bark\Audio_generation.py", line 13, in audio_array + audio_array.gpu().numpy().squeeze() AttributeError: 'Tensor' object has no attribute 'gpu'. Did you mean: 'cpu'?'. This is the error I get using the cpu '$ python Audio_generation.py The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:10000 for open-end generation. Traceback (most recent call last): File "C:\Users\fmsal\OneDrive\Documents\Bark\Audio_generation.py", line 18, in scipy.io.wavfile.write("love_is_death.wav", rate=sample_rate, data=audio_array) File "C:\Users\fmsal\anaconda3\envs\bark\lib\site-packages\scipy\io\wavfile.py", line 772, in write dkind = data.dtype.kind AttributeError: 'torch.dtype' object has no attribute 'kind' (bark) ' image

To1p5 avatar Aug 28 '23 20:08 To1p5

I deleted everything and started over I am now able to get audio to play I still seem to be having the issue where my gpu isn't being recognized I am currently trying to resolve this now. image

To1p5 avatar Aug 28 '23 22:08 To1p5

Try audio_arry.to("cuda")....

tongbaojia avatar Aug 31 '23 18:08 tongbaojia

try my code:

from transformers import AutoProcessor, BarkModel
from datetime import datetime
import scipy
import os
import torch
import time

# Start the clock
start_time = time.time()

# Check if GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("Using GPU:", torch.cuda.get_device_name())
else:
    device = torch.device("cpu")
    print("Using CPU")

# Settings (If you need them)
os.environ["SUNO_OFFLOAD_CPU"] = "True"
os.environ["SUNO_USE_SMALL_MODELS"] = "True"

# Load processor and model
processor = AutoProcessor.from_pretrained("suno/bark")
model = BarkModel.from_pretrained("suno/bark")

# Move model to the device (CPU or GPU)
model.to(device)

voice_preset = "v2/en_speaker_6"

# Process text input
inputs = processor("The James Webb Space Telescope has captured stunning images of the Whirlpool spiral galaxy, located 27 million light-years away from Earth.", voice_preset=voice_preset)

# Move inputs to the same device as the model
for key in inputs.keys():
    inputs[key] = inputs[key].to(device)

# Generate audio
audio_array = model.generate(**inputs)
audio_array = audio_array.cpu().numpy().squeeze()

# Get the current date and time
now = datetime.now()

# Format the date and time as a string: YYYYMMDD_HHMMSS
timestamp_str = now.strftime("%Y%m%d_%H%M%S")

# Use the timestamp as part of the filename
filename = f"bark_out_{timestamp_str}.wav"

# Save as WAV file
sample_rate = model.generation_config.sample_rate
scipy.io.wavfile.write(filename, rate=sample_rate, data=audio_array)

# Stop the clock and print the elapsed time
end_time = time.time()
elapsed_time = end_time - start_time
print(f"The process took {elapsed_time:.2f} seconds.")

ussTom avatar Sep 01 '23 13:09 ussTom