FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

The stop parameter in openai API doesn't work since v0.2.5

Open lslslslslslslslslslsls opened this issue 1 year ago • 1 comments

Since version v0.2.5, it seems the stop parameter in openai api is directly set conv.stop_str, rather than from request. https://github.com/lm-sys/FastChat/blob/v0.2.5/fastchat/serve/api.py#L134

In version v0.2.3, it works when set in the request. https://github.com/lm-sys/FastChat/blob/v0.2.3/fastchat/serve/api.py#L125

The stop parameter is a key when it works with ReAct in langchain, seems quite important to enable.

lslslslslslslslslslsls avatar May 08 '23 15:05 lslslslslslslslslslsls

Thanks for reporting this. Could you send a pull request to fix it?

merrymercy avatar May 08 '23 16:05 merrymercy

Fixed in #818.

jstzwj avatar May 08 '23 19:05 jstzwj

@jstzwj Thanks for the fix, happy to see it in next version.

lslslslslslslslslslsls avatar May 09 '23 00:05 lslslslslslslslslslsls

In the openai_api_server, stop works for non-streaming completions, but not for streaming.

The problem is the unwanted stop sequence gets streamed out before stopping. https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/openai_api_server.py#L518

As a result, this breaks LangChain ReAct agents

mingfang avatar May 19 '23 15:05 mingfang

@andy-yang-1 does the new PR (#1246) fix this? @mingfang If not, could you contribute a PR to fix it?

merrymercy avatar May 20 '23 14:05 merrymercy

I tested the PR locally and it has the same problem. @merrymercy do you think this problem should be fixed in https://github.com/andy-yang-1/FastChat/blob/langchain-support/fastchat/serve/inference.py#L51 so that it doesn't emit the stop sequence?

mingfang avatar May 20 '23 14:05 mingfang

@merrymercy My PR didn't fix the problem, how can we solve it?

andy-yang-1 avatar May 20 '23 14:05 andy-yang-1

We handle the stop string here https://github.com/andy-yang-1/FastChat/blob/fae4087bbb6f7979b61f2e0c2912d77547a5c659/fastchat/serve/inference.py#L164-L175, I think it will correctly delete the stop sequence finally. Does it occur during the middle of the streaming?

merrymercy avatar May 20 '23 15:05 merrymercy

The problem happens when the previous generate token is the partial beginning of the stop sequence. It will not match the entire stop sequence until the next few tokens. As a result the partial stop sequence is stream the client, causing ReAct to fail.

mingfang avatar May 20 '23 15:05 mingfang

@merrymercy This is my PR with the stop detection fix https://github.com/lm-sys/FastChat/pull/1392

mingfang avatar May 20 '23 19:05 mingfang