dify
dify copied to clipboard
Yi34B's 8k context capability on dify doesn't work
Self Checks
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to file this report (我已阅读并同意 Language Policy).
Dify version
0.5.2
Cloud or Self Hosted
Self Hosted (Docker), Self Hosted (Source)
Steps to reproduce
The model I use is Yi-34B (8k), and the openai-api interface is packaged through dify. The result shows that it can only receive <=2k token length, but if only the openai-api interface can receive 4k token length, I would like to ask what is the specific reason?
Here is 2k output:
Here is 3k output:
Here is 6k(openai-api) output:
Can ensure that the parameter given to dify is 8K.
✔️ Expected Behavior
Ability to give Yi34B's 8k context capability on dify doesn't work
❌ Actual Behavior
can't to give Yi34B 8k context capability on dify
🤖
Hey there, @PhenixZhang! Great to see you diving into the depths of dify again. How's the coding journey treating you this time?
Based on the information you've provided, it seems like you're trying to utilize the full 8k context capability of the Yi34B model on dify, but are only able to receive a token length of <=2k. This could be due to the max_tokens
parameter in the ernie-bot-8k.yaml
configuration file. By default, this parameter is set to 1024, but it can be increased up to a maximum of 8000.
To utilize the full 8k context capability, you need to explicitly set the max_tokens
parameter to a higher value when making the API call. Make sure you're not only setting the context_size
to 8000 but also adjusting the max_tokens
parameter accordingly in your request.
Here are the relevant files for your reference:
- api/core/model_runtime/model_providers/baichuan/llm/baichuan2-turbo-192k.yaml
- api/core/model_runtime/model_providers/wenxin/llm/ernie-bot-8k.yaml
As for the specific reason why the Yi34B model is limited to a token length of <=2k in dify, it's not explicitly mentioned in the provided context. However, such limitations are often due to memory constraints and the computational resources required for processing larger sequences in the model.
Lastly, according to the specifications of the Yi34B model, it can handle a maximum token length of 192,000 tokens. This information is found under the parameter_rules
section for the max_tokens
parameter in the provided YAML file.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
Sources
- api/core/model_runtime/model_providers/baichuan/llm/baichuan2-turbo-192k.yaml
- api/core/model_runtime/model_providers/wenxin/llm/ernie-bot-8k.yaml
- api/core/model_runtime/model_providers/__base/tokenizers/gpt2/tokenizer_config.json
- api/core/model_runtime/model_providers/baichuan/llm/baichuan2-turbo.yaml
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Have you tried the OpenAI compatible API
model provider?
Have you tried the
OpenAI compatible API
model provider?
Yes, This is the config screenshot
And using openai server directly without dify is no problem.
Can you try changing the max_tokens
parameter to 4K or larger and see if it works?
Can you try changing the
max_tokens
parameter to 4K or larger and see if it works?
Of course~ I set the maximum number of tokens to 8K at the beginning, respectively in the following two places:
Hi @PhenixZhang, seems that I'm not able to reproduce this issue on my end. Can you upload the backend logs? If you're ok with it and your endpoint is accessible in public, you can also drop your endpoint's information at [email protected], i'll take a look
Close due to it's no longer active, if you have any questions, you can reopen it.
refs to https://github.com/langgenius/dify/pull/2865, When the function is online, the API can be output in the interface and the online model can be used directly.
Or configure the API of the internal testing platform.