GPT-SoVITS
GPT-SoVITS copied to clipboard
About api_v2
When I tried succesfully deployed api_v2, I found when I recieved the sound stream from the server, I got frequently blank wav files..I don't know what's wrong and does any of you encouter the same?
my code is like following:
` def generate_tts_post(text, text_lang, ref_audio_path, prompt_text, prompt_lang, output_path="output/out.wav", server_url="http://127.0.0.1:9880/tts"): payload = { "text": text, "text_lang": text_lang, "ref_audio_path": str(ref_audio_path), "prompt_text": prompt_text, "prompt_lang": prompt_lang, "text_split_method": "cut0", "batch_size": 1, "media_type": "wav", "fragment_interval": 0, "streaming_mode": False, }
try:
print(f"[→] 发送 POST 请求到 {server_url}")
response = requests.post(server_url, json=payload, stream=False)
if response.status_code != 200:
print(f"[✗] 请求失败,状态码: {response.status_code}")
print(response.text)
return None
content_type = response.headers.get("Content-Type", "")
if not content_type.startswith("audio"):
print(f"[✗] 返回的不是音频,而是: {content_type}")
print(response.content.decode("utf-8", errors="ignore"))
return None
os.makedirs(Path(output_path).parent, exist_ok=True)
with open(output_path, "wb") as f:
f.write(response.content)
if is_audio_effective(output_path):
print(f"[✓] 音频保存成功:{output_path}")
return output_path
else:
print(f"[✗] 音频无效,将删除:{output_path}")
os.remove(output_path)
return None
except Exception as e:
print(f"[✗] 请求异常:{e}")
return None
`
from the tts_infer.yaml, I have changed all the combinations of the models, but still the same result with api_v2. it will have 30-50% oppotunity to generate the blank wav stream , and it sometimes give me the wrong combination of diffierent voice pieces(10%). (However the inference_cli will be much better , but tooooo slow indeed)
Really don't have idea why this happen ? Is it because the configuration/parameters wrong? Please somebody help
Can you give me a detailed log? Or give a sample that can be reproduced. In addition, text_split_method should be set to other values, such as "cut5"
please see the following log, after this I got a wav file less than 50kb
Set seed to 769865282 并行推理模式已开启 当开启并行推理模式时,SoVits V3/4模型不支持分桶处理,已自动关闭分桶处理 分段间隔过小,已自动设置为0.01 实际输入的参考文本: 海 过 海 关 不 要 紧 张 海 过 海 关 不 要 紧 张。 ############ 切分文本 ############ 实际输入的目标文本: 买体检套餐,基础项目+肿瘤筛查,价格1280元,用医保支付 实际输入的目标文本(切句后): ['买体检套餐,', '基础项目+肿瘤筛查,', '价格1280元,', '用医保支付。'] ############ 提取文本Bert特征 ############ 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 81.63it/s] ############ 推理 ############ 前端处理后的文本(每句): ['买体检套餐,'] ############ 预测语义Token ############ 0%| | 0/1500 [00:00<?, ?it/s]T2S Decoding EOS [136 -> 138] 0%| | 1/1500 [00:00<00:28, 52.63it/s] ############ 合成音频 ############ 并行合成中... 前端处理后的文本(每句): ['基础项目加肿瘤筛查,'] ############ 预测语义Token ############ 0%| | 0/1500 [00:00<?, ?it/s]T2S Decoding EOS [136 -> 138] 0%| | 1/1500 [00:00<00:27, 53.91it/s] ############ 合成音频 ############ 并行合成中... 前端处理后的文本(每句): ['价格一千二百八十元,'] ############ 预测语义Token ############ 0%| | 0/1500 [00:00<?, ?it/s]T2S Decoding EOS [136 -> 138] 0%| | 1/1500 [00:00<00:20, 71.43it/s] ############ 合成音频 ############ 并行合成中... 前端处理后的文本(每句): ['用医保支付.'] ############ 预测语义Token ############ 0%| | 0/1500 [00:00<?, ?it/s]T2S Decoding EOS [136 -> 139] 0%| | 2/1500 [00:00<00:15, 95.25it/s] ############ 合成音频 ############ 并行合成中... 0.016 0.052 0.423 5.226 INFO: 127.0.0.1:63761 - "POST /tts HTTP/1.1" 200 OK
So you are suggesting that the audio files you have are corrupted. I used to have this issue where a corrupted file is generated. I suspect that it might be the streaming mode. Since the "streaming" parameter in your payload has already been turned off, there is nothing that can been done. Maybe you should try toggling it on and off? I solved my issue and here's my payload.
payload = {
"text": text,
"text_lang": lang,
"ref_audio_path": audiopath,
"aux_ref_audio_paths": [],
"prompt_text": prompt,
"prompt_lang": "ja",
"top_k": 15,
"top_p": 0.95,
"temperature": 0.7,
"text_split_method": "cut0",
"batch_size": 1,
"batch_threshold": 0.75,
"split_bucket": True,
"speed_factor":1.0,
"streaming_mode": False,
"seed": -1,
"parallel_infer": True,
"repetition_penalty": 1.5,
"sample_steps": 32,
"super_sampling": False
}
Another possible method is that you can try to convert the .wav into .mp3 automatically using the AudioSegment module from pydub before returning the audio file from a function.
if response.status_code == 200:
with open("output.wav", "wb") as f:
f.write(response.content)
f.close()
print("Audio generated and saved to output.wav")
print("Converting to .mp3.")
audio = AudioSegment.from_wav("output.wav")
audio.export("output.mp3", format="mp3")
return "output.mp3", "Action completed."
else:
print(f"Error: {response.status_code} - {response.text}")
return("Unexpected Error")