bark icon indicating copy to clipboard operation
bark copied to clipboard

There is a noticeable sound of electricity in the speech I generated

Open junpolaris opened this issue 2 years ago • 9 comments

There is a noticeable sound of electricity in the speech I generated,did anyone have the same question?

junpolaris avatar Apr 24 '23 11:04 junpolaris

yeah, sometimes this is the model attempting to provide background stuff like a cheering crowd or street noise. and sometimes it does sound a bit like static. In general low noice history prompts help to get better quality if that's what you are after, and other than that some people have had some success using denoising algorithms on the output. would love to hear which ones work well. i personally like denoiser: https://github.com/facebookresearch/denoiser

gkucsko avatar Apr 24 '23 16:04 gkucsko

Best voice enhancement i know is the adobe one, but it's still in beta and not open source. If you just look for a quick test what could be possible with voice enhancement and cleanup try it on their webpage: https://podcast.adobe.com/enhance I tested denoiser which is not bad but far from adobe. if someone has some more ideas or help to tune denoiser please post your code.

mxzgithub avatar Apr 25 '23 15:04 mxzgithub

I'm having the same problem, it doesn't sound like crowd or background noise, it sounds like a distortion - like A.I. is talking on the phone from 80's. Here's the sample:

audio1.webm

I get all my results like this.

jnpatrick99 avatar Apr 27 '23 02:04 jnpatrick99

would love to hear which ones work well.

I use modelscope's damo/speech_frcrn_ans_cirm_16k:

from bark import SAMPLE_RATE, generate_audio, preload_models
from IPython.display import Audio, display
from scipy.io import wavfile
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

ans = pipeline(
    Tasks.acoustic_noise_suppression,
    model='damo/speech_frcrn_ans_cirm_16k')

preload_models(
    text_use_gpu=True,
    text_use_small=False,
    coarse_use_gpu=True,
    coarse_use_small=False,
    fine_use_gpu=True,
    fine_use_small=False,
    codec_use_gpu=True,)
    
def generate_tts(text_prompt, speaker):
    audio_array = generate_audio(text_prompt, history_prompt=speaker)
    wavfile.write('output1.wav', SAMPLE_RATE, audio_array)
    ans(
        'output1.wav',
        output_path='output.wav')

def tts(text_prompt, speaker):
    generate_tts(text_prompt, speaker)
    display(Audio('output.wav'))
    
text_prompt = """[happy] Hey, how are you doing? Uh — I was remembering the other day about our time together in the park. [laughs]
[sad] But what I actually wanted to tell you is that I'm going to Boston, so [sighs] this is the last time we are going to see each other."""

tts(text_prompt, 'en_speaker_4')

farrael004 avatar Apr 27 '23 06:04 farrael004

Strange that you say you get it every time. Did you try the web ui on huggingface and see if you get a better result? Wondering if there is an issue or if you just got unlucky

gkucsko avatar Apr 27 '23 13:04 gkucsko

I have this issue too, but it is differs from input to input. I wonder if there is something wrong with some speaker profiles

mxzgithub avatar Apr 27 '23 14:04 mxzgithub

Strange that you say you get it every time. Did you try the web ui on huggingface and see if you get a better result? Wondering if there is an issue or if you just got unlucky

Yeah there's another noise effect on huggingface

tmp4emh4v3a.webm

Sounds like the sound gets "clipped" (?)

jnpatrick99 avatar Apr 27 '23 16:04 jnpatrick99

oo thanks, so the clipping is probably a bug in the huggingface code during the integer conversion. the 'phone' sounds on the other hand are a general feature of Bark. The is not to create the highest possible quality speech like in a standard TTS. the idea is to create any audio from scratch, meaning a low quality phone conversation is just as likely as a studio quality record or lyrics to a music piece

gkucsko avatar Apr 28 '23 22:04 gkucsko

I was able to achieve significantly better results by performing noise reduction using noisereduce on the generated audio. Here is a sample code:

from scipy.io import wavfile
import noisereduce as nr

def noise_reduction(input_path, output_path, prop_decrease=1.0):
    rate, data = wavfile.read(input_path)
    reduced_noise = nr.reduce_noise(y=data, sr=rate, prop_decrease=prop_decrease)
    wavfile.write(output_path, rate, reduced_noise)
    return True

yutohub avatar May 02 '23 19:05 yutohub

closing for inactivity, probably better as a discussion anyways

gkucsko avatar May 11 '23 13:05 gkucsko

yeah, sometimes this is the model attempting to provide background stuff like a cheering crowd or street noise. and sometimes it does sound a bit like static. In general low noice history prompts help to get better quality if that's what you are after, and other than that some people have had some success using denoising algorithms on the output. would love to hear which ones work well. i personally like denoiser: https://github.com/facebookresearch/denoiser

I"m not able to use it with the audio array generated by bark, it only accept a file, is there a way to denoise the audio generated from audio_array = generate_audio(text_prompt) ?

d4rkc0de avatar Oct 07 '23 22:10 d4rkc0de