dify icon indicating copy to clipboard operation
dify copied to clipboard

The max tokens of a model deployed using Xinference can only be set to 2048

Open lileiseven opened this issue 1 year ago • 3 comments

Self Checks

Dify version

0.5.3

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. Deploy model chatglm-6b-32k using xinference
  2. In the Dify application, I use the model chatglm-6b-32k deployed by xinference

✔️ Expected Behavior

In orchestrate, the parameter max tokens can be set to 32k

❌ Actual Behavior

In orchestrate, the parameter max tokens can only be set to 2048

lileiseven avatar Feb 03 '24 11:02 lileiseven

🤖

Hey @lileiseven, great to see you back! Hope you're doing awesome today 😊🚀

Thank you for bringing this to our attention. The max_tokens parameter in the chatglm-6b-32k model configuration specifies the maximum number of tokens that the model can generate for a single output. According to the YAML configuration file for the chatglm-6b-32k model, the max_tokens parameter can be set up to 32,000. Here is the relevant part of the YAML configuration:

parameter_rules:
  - name: max_tokens
    use_template: max_tokens
    required: true
    default: 2000
    min: 1
    max: 32000

If you're experiencing a limitation of 2048 tokens in Orchestrate when using the chatglm-6b-32k model deployed by Xinference, it might be due to a restriction in the Orchestrate or Xinference system, not in the model itself.

I recommend checking the configuration of your Orchestrate and Xinference systems to ensure they are not imposing this limit. If the issue persists, it might be beneficial to check the logs for any error messages or warnings that could provide more insight into the issue.

Please let me know if you need further assistance.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Feb 03 '24 11:02 dosubot[bot]

This issue has already been fixed in #2336. BTW, max_tokens is different from context_size, max_tokens specifies a maximum limit for the maximum current output length, not a context limit.

Yeuoly avatar Feb 03 '24 13:02 Yeuoly

I have upgraded to dify 0.5.3, but the max token can still only be set to 2048.

lileiseven avatar Feb 03 '24 14:02 lileiseven

I have upgraded to dify 0.5.3, but the max token can still only be set to 2048.

BTW, max_tokens is different from context_size, max_tokens specifies a maximum limit for the maximum current output length, not a context limit.

It means output tokens is 2k, the whole context window is 32k

crazywoola avatar Feb 04 '24 02:02 crazywoola

I have upgraded to dify 0.5.3, but the max token can still only be set to 2048.

BTW, max_tokens is different from context_size, max_tokens specifies a maximum limit for the maximum current output length, not a context limit.

It means output tokens is 2k, the whole context window is 32k

How could I promote the output token 2k to longer? 4k 8k?

littlewwwhite avatar Feb 23 '24 12:02 littlewwwhite

@lileiseven Something like this. image

crazywoola avatar Feb 24 '24 01:02 crazywoola

@lileiseven Something like this. image

Thanks. Maybe that works on ollama, but the xinference don't have such options, I can only set less than 2048 in dify's interface, is that a problem about xinference?

littlewwwhite avatar Feb 25 '24 08:02 littlewwwhite

@lileiseven Something like this. image

Thanks. Maybe that works on ollama, but the xinference don't have such options, I can only set less than 2048 in dify's interface, is that a problem about xinference?

Yes, there is not a context size options in xinference, however, dify will try to fetch it from xinference directly, you can visit http://your_xinference_endpoint/v1/models/model_uid to see the context size of your model.

Yeuoly avatar Feb 25 '24 08:02 Yeuoly