fish-speech Partially garbled audio on Huggingface online demo, for a short English input

Partially garbled audio on Huggingface online demo, for a short English input

Open rotemdan opened this issue 2 months ago • 0 comments

Self Checks

[X] This template is only for bug reports. For questions, please visit Discussions.
[X] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文日本語 Portuguese (Brazil)
[X] I have searched for existing issues, including closed ones. Search issues
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template and fill in all required fields.

Cloud or Self Hosted

Cloud

Environment Details

Huggingface online demo (V1.5 medium)

Steps to Reproduce

Tried to synthesize the text "We are not responsible for any misuse of the model, please consider your local laws and regulations before using it."

All parameters are left as default:

Screenshot_1

✔️ Expected Behavior

Normal speech.

❌ Actual Behavior

The audio starts normally as "We are not", and then followed by garbled audio, that sounds sped-up. The total audio duration is only 3 seconds.

https://github.com/user-attachments/assets/45b1d12c-575c-457d-bfc4-45a5f9c6f634

audio.zip

I got this after trying the model with only 2 test inputs, meaning that it's not that rare. If I try to synthesize the same text several times again, I get other voices, and they don't seem to have this issue (as much as I've tested).

Related issues

Seems closely related to issue #632, but I decided to open a new issue because:

My input was in English
Issue #632 is described as "Swallowing words, reading normally at first, then speeding up, and then not reading the last word of the sentence completely", but it's not exactly what is seen here. Here it starts normally but continues with a completely garbled audio
A comment on that issue says the issue is resolved
It was produced by the online demo, and the latest model (1.5 medium)

Dec 05 '24 05:12 rotemdan

fish-speech fish-speech copied to clipboard

Partially garbled audio on Huggingface online demo, for a short English input

Self Checks

Cloud or Self Hosted

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Related issues

fish-speech
fish-speech copied to clipboard