bark icon indicating copy to clipboard operation
bark copied to clipboard

Allow deterministic generations

Open jn-jairo opened this issue 1 year ago • 4 comments

Edit: some changes got implemented in another commit, so I updated the description to represents only the current changes.


This PR adds the set_seed(seed) to allow deterministic generations:

  • Use seed = set_seed() or seed = set_seed(0) to generate and set a random seed, the seed is returned.
  • Use set_seed(seed) to set a specific seed number.
  • Use set_seed(-1) to disable the deterministic process and go back to fully non-deterministic.

BE AWARE: the seed affects torch, numpy and python, so if you are running other softwares that require non-deterministic random values, remember to call set_seed(-1) after you generate the audio.

Example:

from bark import SAMPLE_RATE, generate_audio, preload_models, set_seed
from scipy.io.wavfile import write as write_wav
import numpy as np

preload_models()

prompt = "I have a silky smooth voice, and today I will tell you about the exercise regimen of the common sloth."

set_seed(123)
audio_array_1 = generate_audio(prompt)
write_wav("/path/to/audio_1.wav", SAMPLE_RATE, audio_array_1)

set_seed(123)
audio_array_2 = generate_audio(prompt)
write_wav("/path/to/audio_2.wav", SAMPLE_RATE, audio_array_2)

# BE AWARE: the seed affects torch, numpy and python,
# so if you are running other softwares that require non-deterministic random values,
# remember to call `set_seed(-1)` after you generate the audio.
set_seed(-1)

assert(np.array_equal(audio_array_1, audio_array_2))
"""

jn-jairo avatar Apr 27 '23 03:04 jn-jairo

thanks for this- keeping open so it's on our radar

mcamac avatar May 02 '23 16:05 mcamac

I think this PR isn't necessary anymore, I found a package that does the same thing, now I am using it.

UM-ARM-Lab/pytorch_seed

from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
import numpy as np
import pytorch_seed

preload_models()

prompt = "I have a silky smooth voice, and today I will tell you about the exercise regimen of the common sloth."

with pytorch_seed.SavedRNG(123):
    audio_array_1 = generate_audio(prompt)
write_wav("/path/to/audio_1.wav", SAMPLE_RATE, audio_array_1)

with pytorch_seed.SavedRNG(123):
    audio_array_2 = generate_audio(prompt)
write_wav("/path/to/audio_2.wav", SAMPLE_RATE, audio_array_2)

assert(np.array_equal(audio_array_1, audio_array_2))

jn-jairo avatar May 03 '23 23:05 jn-jairo

neat thanks! out of curiosity, what do you need the seed for? is shouldn't really help with consistency right? like, if you change the text prompt the results will be completely different regardless of seed, no?

gkucsko avatar May 04 '23 01:05 gkucsko

neat thanks! out of curiosity, what do you need the seed for? is shouldn't really help with consistency right? like, if you change the text prompt the results will be completely different regardless of seed, no?

It helps to get the same voice and intonation using the history_prompt + the seed used to create that history_prompt. So I use the seed + history_prompt together, because even with the history_prompt if the seed is different the voice is not exact the same, sometime it sounds too different, but the pair seed + history_prompt fix it.

And the seed also helps to get better consistency in a list of prompts (long text), if we have too many prompts on the list after some prompts it starts to sound too different.

The pair seed + prompt always gets the same voice, if we change the seed or prompt the voice will be different.

So, to find a voice I choose a prompt that fits the voice I want, then I generate multiple audios changing the seed and saving the seed + history_prompt, and I choose the best one.

To generate other prompts in sequence (long text) I set the saved seed + history_prompt for the first prompt on the list, then for the other prompts I set the saved seed + the history_prompt return by the output_full=True of the first prompt, because it helps to keep consistency. With that process the voice sounds the same and keeps the same intonation for the whole audio.

jn-jairo avatar May 04 '23 06:05 jn-jairo

Sorry I'm a bit confused, at the end how do you use deterministic generation here? do you need pytorch_seed? is there a way just with pytorch?

apollner avatar Jul 06 '23 05:07 apollner

Sorry I'm a bit confused, at the end how do you use deterministic generation here? do you need pytorch_seed? is there a way just with pytorch?

Yes you can do it with just pytorch, the pytorch_seed is just a helper function to set and manage the seed.

The reason for using the same seed is simple, the random numbers dictates the generations, if you use the same random numbers (seed) in all generations in a long text it will have more similar results.

Just to let that clear, I think this PR isn't necessary anymore, it is still open as a reference while the suno team researches that topic.

While this is an option to achieve better consistence, now I think it should be better if the bark stays as simple as possible and away from specialized changes, there are a lot of projects that use bark and they can use this approach if they want to, without the need to have it as a builtin feature.

jn-jairo avatar Jul 06 '23 23:07 jn-jairo