bark icon indicating copy to clipboard operation
bark copied to clipboard

Is there any documentation? :D

Open yipy0005 opened this issue 2 years ago • 19 comments

yipy0005 avatar Apr 20 '23 15:04 yipy0005

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

gkucsko avatar Apr 20 '23 15:04 gkucsko

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

UserWarning: No audio backend is available.
No GPU being used. Careful, Inference might be extremely slow!

Some documentation on how to add audio backend or select gpu. (it defaults to cpu)

robbyz512 avatar Apr 20 '23 19:04 robbyz512

It would be nice to know how to generate the semantic histories for other voices (in npz format):

https://github.com/suno-ai/bark/blob/5dc6a4dca2755da4fde37123a4845dc1895798f3/bark/generation.py#L352

avaer avatar Apr 20 '23 20:04 avaer

UserWarning: No audio backend is available.

This warning can be fixed by installing ffmpeg and running:

pip install pysoundfile

and adding the following to the beginning of your python file:

import torchaudio
torchaudio.set_audio_backend("soundfile")

arturh85 avatar Apr 20 '23 20:04 arturh85

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

UserWarning: No audio backend is available.
No GPU being used. Careful, Inference might be extremely slow!

Some documentation on how to add audio backend or select gpu. (it defaults to cpu)

+1 to this. I'm getting the same warning. I wasted a lot of money on a GPU I didn't need, and now I need to justify its use.

dellis23 avatar Apr 21 '23 02:04 dellis23

Uninstalling torch and reinstalling with cuda support using the command from https://pytorch.org/get-started/locally/ seems to have worked.

dellis23 avatar Apr 21 '23 03:04 dellis23

I used the autodoc library to generate documentation using gpt:

https://github.com/dahifi/autodoc-ker/blob/a6bc02a4ca90ef8805cf487f437f81e2035f9119/indexes/suno-ai/bark/.autodoc/docs/markdown

dahifi avatar Apr 24 '23 20:04 dahifi

After reinstalling torch with cuda support, a reinstall of chardet was required.

padmalcom avatar Apr 25 '23 09:04 padmalcom

I used the autodoc library to generate documentation using gpt:

https://github.com/dahifi/autodoc-ker/blob/a6bc02a4ca90ef8805cf487f437f81e2035f9119/indexes/suno-ai/bark/.autodoc/docs/markdown

I could PR this into the main branch, but it would need maintaining.

dahifi avatar Apr 25 '23 13:04 dahifi

Any suggestions for getting the CPU running. The GPU isn't kicking in. It's an NVIDIA GeForce GTX 1050Ti with 4GB GDDR5 on my Dell XPS15 9570 laptop

Just installed the latest NVIDIA drivers

No GPU being used. Careful, inference might be extremely slow! No GPU being used. Careful, inference might be extremely slow! No GPU being used. Careful, inference might be extremely slow!

But it is working, albeit by the CPU. But I'm getting good sound files

import os os.environ['SUNO_USE_SMALL_MODELS']='True' import torchaudio torchaudio.set_audio_backend("soundfile")

from bark import SAMPLE_RATE, generate_audio, preload_models from IPython.display import Audio from scipy.io.wavfile import write as write_wav

download and load all models preload_models( text_use_small=True, coarse_use_small=True, fine_use_gpu=True, fine_use_small=True, )

generate audio from text text_prompt = """ Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe. """ audio_array = generate_audio(text_prompt)

write_wav("Bark_audio4.wav", SAMPLE_RATE, audio_array)

play text in notebook Audio(audio_array, rate=SAMPLE_RATE)

Tanzengeist avatar Apr 25 '23 15:04 Tanzengeist

@Tanzengeist It doesnt have anything to do with the nvidia drivers, you need to install the CUDA runtime and pytorch in a compatible version, you can generate a command for that here: https://pytorch.org/get-started/locally/

arturh85 avatar Apr 25 '23 16:04 arturh85

Thanks. I ran this command: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

Then I checked to see if it worked: import torch x = torch.rand(5, 3) print(x) x = torch.cuda.is_available() print(x)

The console: tensor([[0.1728, 0.3632, 0.0381], [0.3811, 0.2539, 0.6598], [0.1317, 0.9184, 0.4265], [0.0814, 0.6951, 0.2648], [0.7667, 0.4997, 0.8019]]) False

False - it isn't picking it up but it is getting a random tensor(how is that possible?).
I thought maybe I needed something older for the 1050. I went to: https://discuss.pytorch.org/t/help-installing-pytorch-with-gtx-1050-ti/168328 and executed: pip install torch==1.7.0+cu92 torchvision==0.8.0+cu92 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html Then added: 'import torch' to my code. But there is no explicit call to torch in the code so unless bark calls torch... I don't know.

Any other ideas would be appreciated. As you can tell I'm a newbie but very motivated and excited about Bark.

Tanzengeist avatar Apr 25 '23 17:04 Tanzengeist

@Tanzengeist Did you also install the Cuda 11.7 runtime from https://developer.nvidia.com/cuda-11-7-0-download-archive The random tensor is part of your test code x = torch.rand(5, 3) print(x)

arturh85 avatar Apr 25 '23 17:04 arturh85

Thanks. So it's just using the CPU to generate the vectors.

I installed cuda-11.7 runtime. Prior to this I was using the Dell provided latest Nvidia driver. Apparently 11.7 is older because I got a warning.

After installing cuda 11.7, running the program that checks torch.cuda.isavailable() still produces false. Anything you can imagine I might have done prior to this that could be blocking the GPU connection?

Do you know if this 1050 ti is supported by cuda 11.8?

Tanzengeist avatar Apr 25 '23 18:04 Tanzengeist

We need the API urgent! LOL

felipelalli avatar Apr 26 '23 06:04 felipelalli

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

@gkucsko A more comprehensive list of the sound effects like [sigh] and [laughter] for one thing, would be great. Seems to be a common question.

ksylvan avatar Apr 28 '23 21:04 ksylvan

@gkucsko is it possible to simulate simultaneous conversation or interruption? Like in an interview when someone interrupts another person? Or try to speak at the same time?

felipelalli avatar Apr 29 '23 07:04 felipelalli

definitely one of our pie in the sky goals at Suno :) there is nothing fundamental preventing it

gkucsko avatar Apr 30 '23 13:04 gkucsko

working on putting something more comprehensive together. anything in particular i can help with in the meantime?

@gkucsko A more comprehensive list of the sound effects like [sigh] and [laughter] for one thing, would be great. Seems to be a common question.

probably best to have this in discord since there is no exhaustive list. technically anything could work since there is a smooth embedding

gkucsko avatar Apr 30 '23 13:04 gkucsko

closing for inactivity, feel free to reopen if needed

gkucsko avatar May 11 '23 13:05 gkucsko

closing for inactivity, feel free to reopen if needed

Not re-opening this, but I am wondering if there's any better documentation since this issue was opened.

ksylvan avatar May 13 '23 03:05 ksylvan

haha, yeah there is now a tutorials folder with a bunch of notebooks that should give a much better insight into how to use bark. there is also now a discord channel where the community has helped each out greatly!

gkucsko avatar May 13 '23 13:05 gkucsko