LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

support "stop" in api chat/completions

Open davidyao opened this issue 2 months ago • 2 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

CUDA_VISIBLE_DEVICES=0
USE_MODELSCOPE_HUB=1
API_PORT=7860
python src/api_demo.py
--model_name_or_path qwen/Qwen-72B-Chat-Int4
--template qwen

Expected behavior

openai的chat completion 接口支持 stop指令,可以用来做 early stop。 但是现在的接口好像不支持。希望能支持一下以减少不必要的推理

image

System Info

No response

Others

No response

davidyao avatar Apr 03 '24 05:04 davidyao

"do_sample": false,
  "temperature": 0.0,
  "top_p": 0,
  "n": 1,
  "max_tokens": 128,
  "stream": false,
  "stop": "<|endoftext|>"

我在API 请求中,设置了 stop, 也是没有生效;直到达到了模型生成的最大长度后,才停止生成。

JieShenAI avatar Apr 12 '24 09:04 JieShenAI

@JieShenAI 还没支持。

hiyouga avatar Apr 12 '24 09:04 hiyouga