fish-speech icon indicating copy to clipboard operation
fish-speech copied to clipboard

Brand new TTS solution

Results 115 fish-speech issues
Sort by recently updated
recently updated
newest added

![image](https://github.com/fishaudio/fish-speech/assets/86189525/bda1ef52-7514-4c27-a6a8-7fdf4399de76) 错误日志为: ``` Traceback (most recent call last): File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/queueing.py", line 527, in process_events response = await route_utils.call_process_api( File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/route_utils.py", line 270, in call_process_api output = await app.get_blocks().process_api( File "/root/anaconda3/envs/fish-speech/lib/python3.10/site-packages/gradio/blocks.py",...

bug

比如下面这段话: ``` 想象一下,阳光斑斓地倾洒在脉脉流淌的小溪上,流水轻轻地拍打着河岸,就像是自然界最和谐的乐章。小鸟在枝头轻轻啁啾,似乎在讲述着自己的故事,声音悦耳到连大自然都静静聆听。在这样一个明媚的下午,我坐在一棵古老的橡树下,享受着微风带来的温柔的抚慰。 ``` 试了在线demo和我自己微调的模型,都会读错“脉脉”和“啁啾”(版本为1.1)。 很好奇是数据原因还是编码问题

bug

Feel free to ask any kind of questions in the issues page, but please use English since other users may find your questions valuable. **Describe the bug** Hi, i follow...

bug

微调VITS时使用`python tools/vqgan/create_train_split.py data`创建的`vq_train_filelist.txt`内部路径类似 `data\Yanfei\vo_YFLQ001_11_yanfei_02.wav` 然而实际使用的路径是 `Yanfei\vo_YFLQ001_11_yanfei_02.wav` 如果直接使用分割脚本创建的路径会在Sanity Check时报错: `RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.`

bug

这个是一个很棒的工作!我试过VALL-E-X,发现fish-speech的效果在中英文上表现更好,请问,我该如何微调能实现一些小语种语言的语音合成。是进行phonemizer的修改等一系列工作吗。再次感谢您的工作!

enhancement

![image](https://github.com/fishaudio/fish-speech/assets/20057251/acc66f21-5b93-4236-a0f2-a08e0944465d) **To Reproduce** python tools/vqgan/inference.py -i "test.wav"

bug

Feel free to ask any kind of questions in the issues page, but please use English since other users may find your questions valuable. **Describe the bug** 看代码里没有对设置分支进行判断,而直接执行`torch.cuda.synchronize()`,导致没有cuda显卡的电脑报错 ```python #...

bug

input text 作者使用了LPIPS等指标来评估渲染图像的质量。其中,PSNR用于量化像素颜色误差,SSIM用于衡量渲染图像与真实图像的感知相似性,而LPIPS则用于衡量更高层次的感知相似性。 ![image](https://github.com/fishaudio/fish-speech/assets/47085237/50318a72-6858-457d-86b0-8417833320c2)

bug

Feel free to ask any kind of questions in the issues page, but please use English since other users may find your questions valuable. **Describe the bug** A clear and...

bug