dify CosyVoice audio service bug with xinference

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.9.1

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

I was deploying Cosyvoice for tts with xinference in dify. However, I encountered two unexpected errors as followed:

raised from api/services/audio_service.py:

AssertionError: The setup method 'after_request' can no longer be called on the blueprint 'service_api'.
It has already been registered at least once, any changes will not be applied consistently.
Make sure all imports, decorators, functions, etc. needed to set up the blueprint are done before registering it.

I just commented from app import app and with app.app_context(): and the error gone. I'm not familiar with flask, but I think it's not really necessary to import app here?

raised from api/core/model_runtime/model_providers/xinference/tts/tts.py

Parallel generation is not supported by llama-cpp-python

CosyVoice deployed by xinference dose not support parallel generation currenly, so it's not appropriate to use ThreadPoolExecutor here. Related issue: https://github.com/xorbitsai/inference/issues/2386

And btw, according to Line 21, cosyvoice.yaml, max text token is 51866. Hence I think it's not so precise as marked below:

api/core/model_runtime/model_providers/xinference/tts/tts.py

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

Oct 14 '24 01:10 00drdelius

Hello, @00drdelius! I'm here to help you with bug reports, questions, and contributions. Let's work together to resolve the issues you're encountering.

Flask Blueprint Error: The AssertionError occurs because the after_request method is being set up after the blueprint service_api has already been registered. This setup should be completed before registration. If commenting out import app and with app.app_context(): resolves the issue, it suggests that these imports might not be necessary for your current use case. Ensure that all necessary configurations are done before the blueprint registration.
Parallel Generation Limitation: The error "Parallel generation is not supported by llama-cpp-python" indicates that using ThreadPoolExecutor for parallel processing is inappropriate. Since CosyVoice with xinference doesn't support parallel generation, consider processing requests sequentially instead.
Text Token Limit Discrepancy: The cosyvoice.yaml file indicates a max text token limit of 51866, while the code uses a limit of 3500. This discrepancy should be addressed by aligning the code with the configuration file to ensure consistency.

Adjust the code to handle these issues, and consider updating the _get_model_word_limit method to reflect the correct token limit from cosyvoice.yaml.

To continue talking to Dosu, mention @dosu.

Oct 14 '24 01:10 dosubot[bot]

Same question.

Oct 21 '24 15:10 kkjinping

same problem

Oct 28 '24 01:10 songleipu123

Hi, @00drdelius. I'm Dosu, and I'm helping the Dify team manage their backlog. I'm marking this issue as stale.

Issue Summary:

Two bugs were reported in Dify version 0.9.1 while deploying CosyVoice for text-to-speech with xinference.
The first bug, an AssertionError, was temporarily resolved by commenting out certain imports related to a Flask blueprint.
The second bug involved CosyVoice's lack of support for parallel generation, with a suggested workaround to process requests sequentially.
Other users, @kkjinping and @songleipu123, have also reported experiencing the same issues.

Next Steps:

Please let us know if this issue is still relevant to the latest version of the Dify repository by commenting on this issue.
If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

Nov 28 '24 16:11 dosubot[bot]