The Minion randomly intruded into my audio
Self Checks
- [X] This template is only for bug reports. For questions, please visit Discussions.
- [X] I have thoroughly reviewed the project documentation (installation, training, inference) but couldn't find information to solve my problem. English 中文 日本語 Portuguese (Brazil)
- [X] I have searched for existing issues, including closed ones. Search issues
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template and fill in all required fields.
Cloud or Self Hosted
Self Hosted (Source)
Environment Details
Nvidia3090, Python 3.10, torch==2.4.1, torchvision==0.19.1, torchaudio==2.4.1
Steps to Reproduce
/root/miniconda3/bin/python -m tools.api_server --listen 0.0.0.0:6006 --llama-checkpoint-path "/usr/github/fish-speech/checkpoints/fish-speech-1.5" --decoder-checkpoint-path "/usr/github/fish-speech/checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" --decoder-config-name firefly_gan_vq --compile
✔️ Expected Behavior
No response
❌ Actual Behavior
Please listen to the last few seconds of this audio, where a Minion's voice appears.
https://saysay-bucket1.s3.us-west-1.amazonaws.com/uploads/default/20241224/4f75658f38eb0c163acced94328a73b6e78275bb.mp3
Text: By the end of this century, we will have reached a technological singularity, where quantum computing leads to a paradigm shift in epistemology.
This is a probabilistic issue. Out of my 100 audio files, 13 have similar occurrences.
Please help me, how should I solve this problem?
listen this: https://saysay-bucket1.s3.us-west-1.amazonaws.com/uploads/default/20250102/6f025e260633d038db74c5fd218098975e505a77.mp3
This happens all the time for me. Generate a few and choose the median length one.
In my case, I need to preprocess to remove * , for example *args-> args. This way some error sounds will not be generated, but there are more steps that may require preprocessing that I didn't see.
@20km-shimakaze Which characters have you found that you need to remove? Or instead, which character sets do you keep?
I got the same problem when generate Chinese, and it seems to occur randomly, the same sentence could generated correctly when you test it.
I haven't found a solution yet, but increasing the audio duration of the material seems to reduce the probability of occurrence.
@20km-shimakaze Which characters have you found that you need to remove? Or instead, which character sets do you keep?您发现了哪些需要删除的角色?或者,您保留哪些字符集?
I deleted the character '*'.Anyway, this character is not pronounced during TTS, and I myself have found that even after generating the audio several times, the pronunciation will be wrong after reading this character. So I deleted it.
I haven't found a solution yet, but increasing the audio duration of the material seems to reduce the probability of occurrence.
So how long the audio can reduce these Minion
We approved stability this version. You can have a try. I'll close the issue in 7 days if there's no more question.
Can you explain how stability was improved? And when you say "this version" are you talking about openaudio-s1-mini?
The instruction is not that stable on s1, you maybe can use the 1.6 version on our website. We are working hard to improve the instruction generation to make it more stable. You may just generate a little more sentence to solve the problem now. Thanks for your supports!