LLaMA-Factory openaiapi compatible api

openaiapi compatible api_demo support

Open luohao123 opened this issue 1 year ago • 6 comments

可以增加完全兼容openai api的api demo吗？这样的话，我们就可以使用大部分的前端，例如chatbotui，chatgpt-next 等。

Jun 17 '23 04:06 luohao123

可以增加完全兼容openai api的api demo吗？这样的话，我们就可以使用大部分的前端，例如chatbotui，chatgpt-next 等。在最近一次更新后将会支持openai api，您可以使用openai的sdk调用模型推理，下面是在测试阶段的调用参数

{
  "model": "llama-7b-hf-transformers-4.29",
  "messages": [{"role": "user", "content": "你是谁？"}],
  "stream": "False",
  "top_p": 1,
  "temperature": 0.7,
  "max_tokens": 64
}

这里的model是可选的，因为在启动api_demo的时候模型已经确认了您可以使用openai的sdk例如

import openai

openai.api_base = "http://localhost:8000/v1"
model = "llama-7b-hf-transformers-4.29"

def test_chat_completion():
    completion = openai.ChatCompletion.create(
        model=model, 
        messages=[{"role": "user", "content": "Hello! What is your name?"}],
        temperature=0.7
    )
    print(completion.choices[0].message.content)


if __name__ == "__main__":
    test_chat_completion()

Jun 21 '23 07:06 mMrBun

支持streaming 调用吗

Jun 21 '23 14:06 luohao123

支持streaming 调用吗

请求参数中有stream(bool)参数，设置true即可

Jun 22 '23 06:06 mMrBun

支持streaming 调用吗

仓库已经更新了，您可以尝试使用openai包进行调用

Jun 23 '23 04:06 mMrBun

@mMrBun 我发现虽然代码里面有streaming相关的逻辑，但实际请求的时候，时一次性打印的，而不是stream的方式，我的请求代码如下：

stream_mode = True

if stream_mode:
    # create a chat completion
    for chunk in openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{
            "role": "user",
            "content": "列举出中国所有的省市以及对应的省会"
        }],
        stream=True,
    ):
        content = chunk["choices"][0].get("delta", {}).get("content")
        if content is not None:
            print(content, end='', flush=True)
else:
  # create a chat completion
  completion = openai.ChatCompletion.create(
    model=model,
    messages=[{"role": "user", "content": "你好，请介绍一下你自己，以及北京的有哪些小吃。"}],
  )
  # print the completion
  print(completion.choices[0].message.content)

这是为啥

Jun 25 '23 08:06 lucasjinreal

@lucasjinreal

我这边实测是可以流式输出的。

import openai

if __name__ == "__main__":
    response = openai.ChatCompletion.create(
        model="main",
        messages=[
            {"role": "user", "content": "你是谁"}
        ],
        temperature=0.95,
        stream=True
    )
    for chunk in response:
        if hasattr(chunk.choices[0].delta, "content"):
            print(chunk.choices[0].delta.content, end="", flush=True)

Jun 25 '23 14:06 hiyouga

@hiyouga 我测试了一下，好像可以stream。

不过看起来是一行一行stream的，有办法一个字一个字的stream吗（或者一个token一个token）

Jun 26 '23 03:06 lucasjinreal

@lucasjinreal 我这边就是逐字的输出，每次打印一个token

Jun 26 '23 03:06 hiyouga

奇怪哎，为啥我是一行一行输出的

Jun 26 '23 04:06 lucasjinreal

LLaMA-Factory LLaMA-Factory copied to clipboard

openaiapi compatible api_demo support

LLaMA-Factory
LLaMA-Factory copied to clipboard