FastChat
FastChat copied to clipboard
The stop parameter in openai API doesn't work since v0.2.5
Since version v0.2.5, it seems the stop parameter in openai api is directly set conv.stop_str
, rather than from request.
https://github.com/lm-sys/FastChat/blob/v0.2.5/fastchat/serve/api.py#L134
In version v0.2.3, it works when set in the request. https://github.com/lm-sys/FastChat/blob/v0.2.3/fastchat/serve/api.py#L125
The stop parameter is a key when it works with ReAct in langchain, seems quite important to enable.
Thanks for reporting this. Could you send a pull request to fix it?
Fixed in #818.
@jstzwj Thanks for the fix, happy to see it in next version.
In the openai_api_server, stop works for non-streaming completions, but not for streaming.
The problem is the unwanted stop sequence gets streamed out before stopping. https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/openai_api_server.py#L518
As a result, this breaks LangChain ReAct agents
@andy-yang-1 does the new PR (#1246) fix this? @mingfang If not, could you contribute a PR to fix it?
I tested the PR locally and it has the same problem. @merrymercy do you think this problem should be fixed in https://github.com/andy-yang-1/FastChat/blob/langchain-support/fastchat/serve/inference.py#L51 so that it doesn't emit the stop sequence?
@merrymercy My PR didn't fix the problem, how can we solve it?
We handle the stop string here https://github.com/andy-yang-1/FastChat/blob/fae4087bbb6f7979b61f2e0c2912d77547a5c659/fastchat/serve/inference.py#L164-L175, I think it will correctly delete the stop sequence finally. Does it occur during the middle of the streaming?
The problem happens when the previous generate token is the partial beginning of the stop sequence. It will not match the entire stop sequence until the next few tokens. As a result the partial stop sequence is stream the client, causing ReAct to fail.
@merrymercy This is my PR with the stop detection fix https://github.com/lm-sys/FastChat/pull/1392