dify The max tokens of a model deployed using Xinference can only be set to 2048

Self Checks

[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to file this report (我已阅读并同意 Language Policy).

Dify version

0.5.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Deploy model chatglm-6b-32k using xinference
In the Dify application, I use the model chatglm-6b-32k deployed by xinference

✔️ Expected Behavior

In orchestrate, the parameter max tokens can be set to 32k

❌ Actual Behavior

In orchestrate, the parameter max tokens can only be set to 2048

Feb 03 '24 11:02 lileiseven

🤖

Hey @lileiseven, great to see you back! Hope you're doing awesome today 😊🚀

Thank you for bringing this to our attention. The max_tokens parameter in the chatglm-6b-32k model configuration specifies the maximum number of tokens that the model can generate for a single output. According to the YAML configuration file for the chatglm-6b-32k model, the max_tokens parameter can be set up to 32,000. Here is the relevant part of the YAML configuration:

parameter_rules:
  - name: max_tokens
    use_template: max_tokens
    required: true
    default: 2000
    min: 1
    max: 32000

If you're experiencing a limitation of 2048 tokens in Orchestrate when using the chatglm-6b-32k model deployed by Xinference, it might be due to a restriction in the Orchestrate or Xinference system, not in the model itself.

I recommend checking the configuration of your Orchestrate and Xinference systems to ensure they are not imposing this limit. If the issue persists, it might be beneficial to check the logs for any error messages or warnings that could provide more insight into the issue.

Please let me know if you need further assistance.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Feb 03 '24 11:02 dosubot[bot]

This issue has already been fixed in #2336. BTW, max_tokens is different from context_size, max_tokens specifies a maximum limit for the maximum current output length, not a context limit.

Feb 03 '24 13:02 Yeuoly

I have upgraded to dify 0.5.3, but the max token can still only be set to 2048.

Feb 03 '24 14:02 lileiseven

I have upgraded to dify 0.5.3, but the max token can still only be set to 2048.

BTW, max_tokens is different from context_size, max_tokens specifies a maximum limit for the maximum current output length, not a context limit.

It means output tokens is 2k, the whole context window is 32k

Feb 04 '24 02:02 crazywoola

I have upgraded to dify 0.5.3, but the max token can still only be set to 2048.

BTW, max_tokens is different from context_size, max_tokens specifies a maximum limit for the maximum current output length, not a context limit.

It means output tokens is 2k, the whole context window is 32k

How could I promote the output token 2k to longer? 4k 8k?

Feb 23 '24 12:02 littlewwwhite

@lileiseven Something like this.

Feb 24 '24 01:02 crazywoola

@lileiseven Something like this.

Thanks. Maybe that works on ollama, but the xinference don't have such options, I can only set less than 2048 in dify's interface, is that a problem about xinference?

Feb 25 '24 08:02 littlewwwhite

@lileiseven Something like this.

Thanks. Maybe that works on ollama, but the xinference don't have such options, I can only set less than 2048 in dify's interface, is that a problem about xinference?

Yes, there is not a context size options in xinference, however, dify will try to fetch it from xinference directly, you can visit http://your_xinference_endpoint/v1/models/model_uid to see the context size of your model.

Feb 25 '24 08:02 Yeuoly

dify dify copied to clipboard

The max tokens of a model deployed using Xinference can only be set to 2048

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

Sources

dify
dify copied to clipboard