LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

超过了设置的最大token数,模型还是有返回

Open luhairong11 opened this issue 8 months ago • 0 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

python src/api.py --model_name_or_path /data/models/LLM_models/qwen/Qwen-72B-Chat-Int4 --template qwen --infer_backend vllm --vllm_gpu_util 0.9 --vllm_maxlen 8000 上述配置设置了最大token为8000,当输入token超过8000的时候,流式调用接口的时候还是会返回2条空内容的json数据,vllm底层会有一个警告,提示超过了最大token。咱们代码里面能不能抛出一个异常错误,这样返回的内容便于直观理解。

Expected behavior

No response

System Info

No response

Others

No response

luhairong11 avatar May 29 '24 15:05 luhairong11