ragflow [Bug]: During local embedding，RAGFlow is sending too much text at once, exceeding the model's maximum token limit, causing the model to be unable to fully read the input.

Is there an existing issue for the same bug?

[x] I have checked the existing issues.

RAGFlow workspace code commit ID

main

RAGFlow image version

v0.15.1,nightly

Other environment information

Actual behavior

When employing embedding models with a lower 'maximum input token' capacity，models such as bge-large and conan-embedding-v1 are limited to a maximum input of 512 tokens. When using these models for embedding, RAGFlow sends more than 512 tokens at once, ollama will encounter an error. I've found the cause of the error here：https://github.com/ollama/ollama/issues/7288#issuecomment-2591709109 Although I can adjust the maximum input limit of the model in ollama, it will cause RAGFlow's text to be truncated, resulting in incomplete embeddings.Additionally, I'm unable to locate a setting within RAGFlow to control the maximum input for the embedding model.

When adding a model, the max token setting controls the maximum output, not the input, which doesn't apply to embedding models.

The same issue of an ineffective max token option also exists when adding reranker models.

Expected behavior

Please add a setting to RAGFlow to control the maximum number of tokens sent to the embedding model per request, and also fix the bug where the max token limit is ineffective when adding reranker models.

Steps to reproduce

Using the bge-large:latest model in ollama, if the embedding is performed with a method other than 'general' (I am using 'book'), and the token count goes over 512, an error occurs and the embedding is terminated.

Additional information

No response

Jan 30 '25 12:01 XTFG

What about reducing chunk token size in chunking method settings?

But it will not slice text apart from the middle of text to ruin the semantics, which is meaningless to embedding.

Feb 01 '25 13:02 KevinHuSh

在分块方法设置中减小块令牌大小怎么样？

但它不会将文本从文本中间切开以破坏语义，这对嵌入毫无意义。

You know, some parsing methods, like QA, resume, manual, table, paper, laws, book, presentation, one, cannot manually set the Chunk token number.

Feb 01 '25 14:02 XTFG

We did not depends embedding a lot since its limitation of long text semantics representation.

Feb 03 '25 02:02 KevinHuSh

Could you please tell me what the significance of the 'max output tokens' setting is for embedding and rerank models?

Feb 03 '25 02:02 XTFG

Sometimes, if the input is too long, the serving of embedding reports error directly without automatically truncating.

Feb 05 '25 02:02 KevinHuSh

Sometimes, if the input is too long, the serving of embedding reports error directly without automatically truncating.

Is that possible add OpenAI-API-Compatible configuration options, such as input length, I often encounter an error that exceeds 8196 when using paper mode for word embedding。

Feb 05 '25 06:02 tusik

I also get errors when integrating with intfloat/multilingual-e5-large-instruct with General & max token of 512.

This embedding model seems to be well rated in MMTEB https://arxiv.org/abs/2502.13595

HF TEI text-embeddings-inference logs :

2025-02-28T11:28:23.700376Z ERROR openai_embed:embed_pooled{truncate=false truncation_direction=Right prompt_name=None normalize=true}: text_embeddings_core::infer: core/src/in
fer.rs:332: Input validation error: `inputs` must have less than 512 tokens. Given: 562                                                                                         
2025-02-28T11:28:23.703694Z ERROR openai_embed:embed_pooled{truncate=false truncation_direction=Right prompt_name=None normalize=true}: text_embeddings_core::infer: core/src/in
fer.rs:332: Input validation error: `inputs` must have less than 512 tokens. Given: 538                                                                                         
2025-02-28T11:28:23.703902Z ERROR openai_embed:embed_pooled{truncate=false truncation_direction=Right prompt_name=None normalize=true}: text_embeddings_core::infer: core/src/in
fer.rs:332: Input validation error: `inputs` must have less than 512 tokens. Given: 518
2025-02-28T11:29:01.629322Z ERROR openai_embed:embed_pooled{truncate=false truncation_direction=Right prompt_name=None normalize=true}: text_embeddings_core::infer: core/src/in
fer.rs:332: Input validation error: `inputs` must have less than 512 tokens. Given: 996                                                                                         
2025-02-28T11:29:01.631500Z ERROR openai_embed:embed_pooled{truncate=false truncation_direction=Right prompt_name=None normalize=true}: text_embeddings_core::infer: core/src/in
fer.rs:332: Input validation error: `inputs` must have less than 512 tokens. Given: 1192

Kind regards & thanks for your great work. David.

Feb 28 '25 14:02 dromeuf

I changed the chunk token number and can see the total chunk number changed from 20 (chunk token number 800) to 51 (chunk token number 128) but the error is still occurring that the requested input length is larger than what the provider can support. And as per the error messages, it seems that the input length doesn't changed at all.

Input length of input_ids is `4783` and exceed max_sequence_length: `4096`

No error if I use an OLLAMA local embedding model, maybe the input tokens are truncated in this case.

What about reducing chunk token size in chunking method settings?

But it will not slice text apart from the middle of text to ruin the semantics, which is meaningless to embedding.

Mar 02 '25 15:03 pansila

it is 0.17.0 now and the problems are still there. You cannot set chunk token number for document types like manual, QA, resume. And the too long embedding input problem still occurs often.

Mar 11 '25 00:03 gt-player

Issue still persists. "Chunk token number" is ignored and ragflow is sending oversized chunks to embedding model.

Apr 10 '25 12:04 Darxeal

Hi, I've noticed the same thing with :

version 0.17.2
with vLLM embedding intfloat/multilingual-e5-large-instruct limit to 512 max tokens model providers configuration.
KB General chunking to 250 tokens.

RAGFlow logs : openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 512 tokens. However, you requested 1084 tokens in the input for embedding generation. Please reduce the length of the input.", 'type': 'BadRequestError', 'param': None, 'code': 400}

vLLM logs :

ERROR 04-14 10:05:44 [serving_embedding.py:143]     raise ValueError(
ERROR 04-14 10:05:44 [serving_embedding.py:143] ValueError: This model's maximum context length is 512 tokens. However, you requested 1084 tokens in the input for embedding generation. Please reduce the length of the input.
INFO:     172.18.0.6:45496 - "POST /v1/embeddings HTTP/1.1" 400 Bad Request
INFO 04-14 10:05:54 [metrics.py:488] Avg prompt throughput: 0.9 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

Kind regards,

Apr 14 '25 08:04 dromeuf

Sometimes, if the input is too long, the serving of embedding reports error directly without automatically truncating.

how else do you retrieve texts ?

May 09 '25 07:05 chuangzhidan

We did not depends embedding a lot since its limitation of long text semantics representation.

how else do you retrieve texts ?

May 09 '25 07:05 chuangzhidan

meet too. how can i address this problem？

Jun 10 '25 02:06 kanseaveg

Error still exists in Version: v0.19.1 slim

ERROR: status_code: 400, body: {'object': 'error', 'message': "This model's maximum context length is 512 tokens. However, you requested 532 tokens in the input for embedding generation. Please reduce the length of the input.", 'type': 'BadRequestError', 'param': None, 'code': 400}

Jul 02 '25 05:07 slychief2

Sometimes, if the input is too long, the serving of embedding reports error directly without automatically truncating.

I think this is a serious bug. Since a chunking strategy is used, the size of each chunk should be aligned with the set size, rather than using separators to determine whether to split into new chunks, nor directly truncating. Instead, when it exceeds the chunk size, a new chunk should be formed. Otherwise, there will always be an error indicating an exceeded maximum length, leading to parsing failure. This way, it can ensure that the length is not exceeded during embedding, and also ensure that it does not exceed the length when using rerank models or large language models. The damage to the semantics is not very serious.

Nov 10 '25 07:11 CVer2022