dify
dify copied to clipboard
The max tokens of a model deployed using Xinference can only be set to 2048
Self Checks
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to file this report (我已阅读并同意 Language Policy).
Dify version
0.5.3
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
- Deploy model chatglm-6b-32k using xinference
- In the Dify application, I use the model chatglm-6b-32k deployed by xinference
✔️ Expected Behavior
In orchestrate, the parameter max tokens can be set to 32k
❌ Actual Behavior
In orchestrate, the parameter max tokens can only be set to 2048
🤖
Hey @lileiseven, great to see you back! Hope you're doing awesome today 😊🚀
Thank you for bringing this to our attention. The max_tokens
parameter in the chatglm-6b-32k model configuration specifies the maximum number of tokens that the model can generate for a single output. According to the YAML configuration file for the chatglm-6b-32k model, the max_tokens
parameter can be set up to 32,000. Here is the relevant part of the YAML configuration:
parameter_rules:
- name: max_tokens
use_template: max_tokens
required: true
default: 2000
min: 1
max: 32000
If you're experiencing a limitation of 2048 tokens in Orchestrate when using the chatglm-6b-32k model deployed by Xinference, it might be due to a restriction in the Orchestrate or Xinference system, not in the model itself.
I recommend checking the configuration of your Orchestrate and Xinference systems to ensure they are not imposing this limit. If the issue persists, it might be beneficial to check the logs for any error messages or warnings that could provide more insight into the issue.
Please let me know if you need further assistance.
Sources
- api/core/model_runtime/model_providers/chatglm/llm/chatglm2-6b-32k.yaml
- api/core/model_runtime/model_providers/chatglm/llm/chatglm3-6b-32k.yaml
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
This issue has already been fixed in #2336. BTW, max_tokens is different from context_size, max_tokens specifies a maximum limit for the maximum current output length, not a context limit.
I have upgraded to dify 0.5.3, but the max token can still only be set to 2048.
I have upgraded to dify 0.5.3, but the max token can still only be set to 2048.
BTW, max_tokens is different from context_size, max_tokens specifies a maximum limit for the maximum current output length, not a context limit.
It means output tokens is 2k, the whole context window is 32k
I have upgraded to dify 0.5.3, but the max token can still only be set to 2048.
BTW, max_tokens is different from context_size, max_tokens specifies a maximum limit for the maximum current output length, not a context limit.
It means output tokens is 2k, the whole context window is 32k
How could I promote the output token 2k to longer? 4k 8k?
@lileiseven Something like this.
@lileiseven Something like this.
Thanks. Maybe that works on ollama, but the xinference don't have such options, I can only set less than 2048 in dify's interface, is that a problem about xinference?
@lileiseven Something like this.
Thanks. Maybe that works on ollama, but the xinference don't have such options, I can only set less than 2048 in dify's interface, is that a problem about xinference?
Yes, there is not a context size options in xinference, however, dify will try to fetch it from xinference directly, you can visit http://your_xinference_endpoint/v1/models/model_uid to see the context size of your model.