ragflow
ragflow copied to clipboard
[Question]: With the vLLM backend, with max_tokens disabled, the output will still be truncated
Describe your problem
Scene description:
- Version v.016.0 used by ragflow
- Xinference with vLLM backend
- max_tokens disabled
- the output will still be truncated
Xinference DEBUG LOG
generate config: {'frequency_penalty': 0.7, 'presence_penalty': 0.4, 'temperature': 0.1, 'top_p': 0.3, 'stream': True, 'stop': ['<|end?of?sentence|>'], 'stop_token_ids': [151643]}
There are no configuration items for max_tokens,but the output is still truncated.This is because max_tokens have a default value or because ignore_eos defaults to False?