[Question]: With the vLLM backend, with max_tokens disabled, the output will still be truncated

Open xyk0930 opened this issue 10 months ago • 2 comments

Describe your problem

Scene description:

Version v.016.0 used by ragflow
Xinference with vLLM backend
max_tokens disabled
the output will still be truncated

Xinference DEBUG LOG generate config: {'frequency_penalty': 0.7, 'presence_penalty': 0.4, 'temperature': 0.1, 'top_p': 0.3, 'stream': True, 'stop': ['<｜end?of?sentence｜>'], 'stop_token_ids': [151643]} There are no configuration items for max_tokens,but the output is still truncated.This is because max_tokens have a default value or because ignore_eos defaults to False?

Mar 04 '25 08:03 xyk0930