fish-speech issues

Results 162 fish-speech issues

Sort by recently updated

[BUG] 加入参考音就报错

![QQ图片20240516222344](https://github.com/fishaudio/fish-speech/assets/31399799/783021b0-f527-4628-a6a3-517c770175e2) ![QQ图片20240516222350](https://github.com/fishaudio/fish-speech/assets/31399799/de00cead-6ec0-4d1f-98d5-e3d3f97294c7) 一直报错无法使用。。

188140040

bug

[BUG]无法正确读取的特定文本 & 尾音无法读完

**Describe the bug** 1.在WEB界面测试中,无论是使用参卡音频还是随即说话人,都无法正确读取文本“123456789ABCDEFG”。 2.在WEB界面测试中,生成的尾音,最后一个字无法读完.往往是在一半就停止了 **To Reproduce** ``` python python tools/webui.py \ --llama-checkpoint-path "checkpoints/text2semantic-sft-medium-v1.1-4k.pth" \ --llama-config-name dual_ar_2_codebook_medium \ --decoder-config-name vits_decoder_finetune \ --decoder-checkpoint-path "checkpoints/vits_decoder_v1.1.ckpt" ``` 1.在WEB界面中,将文本“123456789ABCDEFG”输入, 播放或听取生成的语音输出。 2.在WEB界面中,将文本 “由 Fish Audio...

laishujie

bug

说话人会有概率加入到语音生成出来[BUG]

![image](https://github.com/fishaudio/fish-speech/assets/42288790/a6a6d92f-15e0-4097-acec-eaab791e26b8) https://huggingface.co/spaces/fishaudio/fish-speech-1 （webui）这是推理出来的音频，复现概率很高（这个说话人是为了提高复现概率） [wav](https://fishaudio-fish-speech-1.hf.space/file=/tmp/gradio/aab5417c350356cafcc24fedf58edcba7be7383a/audio.wav)

only-ns

bug

RuntimeError: The expanded size of the tensor (2048) must match the existing size (3882) at non-singleton dimension 1. Target sizes: [3, 2048]. Tensor sizes: [3, 3882][BUG]

在使用这个命令的时候报错python tools/llama/generate.py --text "床前明月光，疑似地上霜。举头望明月，低头思故乡" --prompt-text "1234567" --prompt-tokens "fake.npy" --config-name dual_ar_2_codebook_medium --checkpoint-path "checkpoints/text2semantic-sft-medium-v1.1-4k.pth" --num-samples 2 --compile

wuye901126

bug

[BUG]使用API接口进行推理，参考音频是女生，生成的是男生

使用API接口进行推理，参考音频是女生，生成的是男生

wuye901126

bug

[Feature] 使用更大的语言模型

用上了 LLAMA 1b 模型之后，对比以前的小模型(GPT SOVITS 的 AR) 在读音和语气上有了明显的提升请问如果没有硬件限制的情况下，使用更大的模型（如7b/13b）会对合成效果有明显提升吗？

CloudTronUSA

enhancement

Got poor quality for japanese data.

Hi, Thank you for great work. But i got poor quality for synthesize japanese data. My data that has about 12hrs audios and 16 speakers was extracted from 3 visual...

yw0nam

[Feature]能否提供api的使用文档及示例，比如说对接开源阅读tts之类的例子？

作为新手小白看api很痛苦，不知道各位大佬能否提供一些说明文档和使用例子？

beavermarine

enhancement

[BUG]求助，开始训练时显示如下

![屏幕截图 2024-05-17 163400](https://github.com/fishaudio/fish-speech/assets/170086190/3b5eb81a-5325-4c09-b65d-7abc821a46a2)

A7890A

bug

about sft model

what is sft model? in text2semantic_sft.yaml, it have " ckpt_path: checkpoints/text2semantic-medium-v1-2k.pth resume_weights_only: true " What is the difference between sft and without sft?

zshy1205

fish-speech
fish-speech copied to clipboard

Metadata

[BUG] 加入参考音就报错

[BUG]无法正确读取的特定文本 & 尾音无法读完

说话人会有概率加入到语音生成出来[BUG]

RuntimeError: The expanded size of the tensor (2048) must match the existing size (3882) at non-singleton dimension 1. Target sizes: [3, 2048]. Tensor sizes: [3, 3882][BUG]

[BUG]使用API接口进行推理，参考音频是女生，生成的是男生

[Feature] 使用更大的语言模型

Got poor quality for japanese data.

[Feature]能否提供api的使用文档及示例，比如说对接开源阅读tts之类的例子？

[BUG]求助，开始训练时显示如下

about sft model

← Metadata

Owner

Metadata

fish-speech fish-speech copied to clipboard

Metadata

← Metadata

Owner

Metadata

fish-speech
fish-speech copied to clipboard