FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Is the OpenAI-compatible API sending the `functions` argument to the model?

Open AmbroxMr opened this issue 2 years ago • 7 comments
trafficstars

I'm currently using the OS model functionary, which supports functions in a manner similar to how GPT operates through the OpenAI API. I've successfully deployed the model worker and proceeded to make chat completion requests using the functions feature:

curl {API}/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "functionary-7b",
    "messages": [{"role": "user", "content": "salute Anthony"}],
  "functions": [
    {"name": "salute",
    "description": "This function can be used to salute someone.",
    "parameters": {
      "properties": {
        "who": {
          "description": "Name of whom to salute. o7",
          "title": "Who",
          "type": "string"
        }
      },
      "required": [
        "who"
      ],
      "title": "salute",
      "type": "object"
    }
  }]
  }'

The expected outcome was to receive a message that included the function_call argument, indicating the use of the salute function with the argument Anthony. However, the actual result was as follows:

{"id":"chatcmpl-fUcuEKMeCkavtjsPCzeKft","object":"chat.completion","created":1694683275,"model":"functionary-7b","choices":[{"index":0,"message":{"role":"assistant","content":"\nSalute!\n"},"finish_reason":"stop"}],"usage":{"prompt_tokens":579,"total_tokens":585,"completion_tokens":6}}

It appears that the functions argument is being ignored altogether. My question is whether fastchat/serve/openai_api_server.py is indeed sending the functions argument to the model.

AmbroxMr avatar Sep 14 '23 09:09 AmbroxMr

openai_api_server only accept these parameters

@app.post("/v1/chat/completions", dependencies=[Depends(check_api_key)])
async def create_chat_completion(request: ChatCompletionRequest):
class ChatCompletionRequest(BaseModel):
    model: str
    messages: Union[str, List[Dict[str, str]]]
    temperature: Optional[float] = 0.7
    top_p: Optional[float] = 1.0
    n: Optional[int] = 1
    max_tokens: Optional[int] = None
    stop: Optional[Union[str, List[str]]] = None
    stream: Optional[bool] = False
    presence_penalty: Optional[float] = 0.0
    frequency_penalty: Optional[float] = 0.0
    user: Optional[str] = None

jiaolongxue avatar Sep 25 '23 03:09 jiaolongxue

modify protocol/openai_api_protocol.py

class ChatCompletionRequest(BaseModel):
    model: str
    messages: Union[str, List[Dict[str, str]]]
    temperature: Optional[float] = 0.7
    top_p: Optional[float] = 1.0
    n: Optional[int] = 1
    max_tokens: Optional[int] = None
    stop: Optional[Union[str, List[str]]] = None
    stream: Optional[bool] = False
    presence_penalty: Optional[float] = 0.0
    frequency_penalty: Optional[float] = 0.0
    user: Optional[str] = None
    functions: Optional[List[Any]] = None

modify serve/openai_api_server.py

@app.post("/v1/chat/completions", dependencies=[Depends(check_api_key)])
async def create_chat_completion(request: ChatCompletionRequest):
    """Creates a completion for the chat message"""
    print("\n fastchat request:", request)
    error_check_ret = await check_model(request)
    if error_check_ret is not None:
        return error_check_ret
    error_check_ret = check_requests(request)
    if error_check_ret is not None:
        return error_check_ret

    async with httpx.AsyncClient() as client:
        worker_addr = await get_worker_address(request.model, client)

        gen_params = await get_gen_params(
            request.model,
            worker_addr,
            request.messages,
            temperature=request.temperature,
            top_p=request.top_p,
            max_tokens=request.max_tokens,
            echo=False,
            stream=request.stream,
            stop=request.stop,
        )
        gen_params["functions"] = request.functions
        error_check_ret = await check_length(
            request,
            gen_params["prompt"],
            gen_params["max_new_tokens"],
            worker_addr,
            client,
        )

jiaolongxue avatar Sep 25 '23 04:09 jiaolongxue

@jiaolongxue Since you seem to be familiar with it, could you make a pull request to add this functionality to the library?

PyroGenesis avatar Nov 08 '23 20:11 PyroGenesis

So long as fastchat not support functions parameter yet, you can use functions calling with open source models (such as Qwen-1.8b, Qwen-14b, Chatglm3) by following steps:

  1. render functions in messages directly, then send them to fastchat openai api server.
  2. define custom agent using langchain which has customized prompt and output parser

a reference can be found here: qwen-agent

liunux4odoo avatar Dec 14 '23 12:12 liunux4odoo

modify protocol/openai_api_protocol.py

class ChatCompletionRequest(BaseModel):
    model: str
    messages: Union[str, List[Dict[str, str]]]
    temperature: Optional[float] = 0.7
    top_p: Optional[float] = 1.0
    n: Optional[int] = 1
    max_tokens: Optional[int] = None
    stop: Optional[Union[str, List[str]]] = None
    stream: Optional[bool] = False
    presence_penalty: Optional[float] = 0.0
    frequency_penalty: Optional[float] = 0.0
    user: Optional[str] = None
    functions: Optional[List[Any]] = None

modify serve/openai_api_server.py

@app.post("/v1/chat/completions", dependencies=[Depends(check_api_key)])
async def create_chat_completion(request: ChatCompletionRequest):
    """Creates a completion for the chat message"""
    print("\n fastchat request:", request)
    error_check_ret = await check_model(request)
    if error_check_ret is not None:
        return error_check_ret
    error_check_ret = check_requests(request)
    if error_check_ret is not None:
        return error_check_ret

    async with httpx.AsyncClient() as client:
        worker_addr = await get_worker_address(request.model, client)

        gen_params = await get_gen_params(
            request.model,
            worker_addr,
            request.messages,
            temperature=request.temperature,
            top_p=request.top_p,
            max_tokens=request.max_tokens,
            echo=False,
            stream=request.stream,
            stop=request.stop,
        )
        gen_params["functions"] = request.functions
        error_check_ret = await check_length(
            request,
            gen_params["prompt"],
            gen_params["max_new_tokens"],
            worker_addr,
            client,
        )

This still doesn't work.Can you help me find out what's wrong? Thanks! image image

chenhaoqiang avatar Dec 20 '23 04:12 chenhaoqiang

is there any appetite for this to be integrated through a PR? Open to taking a look if so

ckgresla avatar Feb 07 '24 00:02 ckgresla

@ckgresla Can you? I know we're a bit late on this, but should't be that hard. Have a look also into https://github.com/lm-sys/FastChat/pull/2085

surak avatar Jan 31 '25 17:01 surak