CosyVoice audio service bug with xinference
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
0.9.1
Cloud or Self Hosted
Self Hosted (Source)
Steps to reproduce
I was deploying Cosyvoice for tts with xinference in dify. However, I encountered two unexpected errors as followed:
- raised from api/services/audio_service.py:
AssertionError: The setup method 'after_request' can no longer be called on the blueprint 'service_api'.
It has already been registered at least once, any changes will not be applied consistently.
Make sure all imports, decorators, functions, etc. needed to set up the blueprint are done before registering it.
I just commented from app import app and with app.app_context(): and the error gone.
I'm not familiar with flask, but I think it's not really necessary to import app here?
- raised from api/core/model_runtime/model_providers/xinference/tts/tts.py
Parallel generation is not supported by llama-cpp-python
CosyVoice deployed by xinference dose not support parallel generation currenly, so it's not appropriate to use ThreadPoolExecutor here. Related issue: https://github.com/xorbitsai/inference/issues/2386
And btw, according to Line 21, cosyvoice.yaml, max text token is 51866. Hence I think it's not so precise as marked below:
api/core/model_runtime/model_providers/xinference/tts/tts.py
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
Hello, @00drdelius! I'm here to help you with bug reports, questions, and contributions. Let's work together to resolve the issues you're encountering.
-
Flask Blueprint Error: The
AssertionErroroccurs because theafter_requestmethod is being set up after the blueprintservice_apihas already been registered. This setup should be completed before registration. If commenting outimport appandwith app.app_context():resolves the issue, it suggests that these imports might not be necessary for your current use case. Ensure that all necessary configurations are done before the blueprint registration. -
Parallel Generation Limitation: The error "Parallel generation is not supported by llama-cpp-python" indicates that using
ThreadPoolExecutorfor parallel processing is inappropriate. Since CosyVoice with xinference doesn't support parallel generation, consider processing requests sequentially instead. -
Text Token Limit Discrepancy: The
cosyvoice.yamlfile indicates a max text token limit of 51866, while the code uses a limit of 3500. This discrepancy should be addressed by aligning the code with the configuration file to ensure consistency.
Adjust the code to handle these issues, and consider updating the _get_model_word_limit method to reflect the correct token limit from cosyvoice.yaml.
To continue talking to Dosu, mention @dosu.
Same question.
same problem
Hi, @00drdelius. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.
Issue Summary:
- Two bugs were reported in Dify version 0.9.1 while deploying CosyVoice for text-to-speech with xinference.
- The first bug, an
AssertionError, was temporarily resolved by commenting out certain imports related to a Flask blueprint. - The second bug involved CosyVoice's lack of support for parallel generation, with a suggested workaround to process requests sequentially.
- Other users, @kkjinping and @songleipu123, have also reported experiencing the same issues.
Next Steps:
- Please let us know if this issue is still relevant to the latest version of the Dify repository by commenting on this issue.
- If there is no further activity, this issue will be automatically closed in 15 days.
Thank you for your understanding and contribution!