when will add the stream feature in api?
what do you mean by stream feature? aren't our current CLI and web interface both streaming?
I mean the api response message should support the streaming type and would there be a post parameter to turn up/off the streaming (true/false)?
Reference to openai
stream Boolean
If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message. See the OpenAI Cookbook for example code.
That would be great, I'm joining the request please.
Joining the request. Need stream in API.
I need this as well!
I was looking into this and to make this work there has to be to adjustment the api server to accept a parameter "stream".
JSON CURL
{
"model": "vicuna-13b",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"stream": true
}
Or
API Python
completion = client.ChatCompletion.create(
model="vicuna-13b",
messages=[
{"role": "user", "content": content}
]
steam=true
)
In the API server this feature shouldnt be to difficult because the Web-server where the chat menu is already has text streaming built.
fastChat/fastchat/serve/gradio_web_server.py
try:
# Stream output
response = requests.post(
worker_addr + "/worker_generate_stream",
headers=headers,
json=gen_params,
stream=True,
timeout=20,
)
A modification in the file FastChat/fastchat/serve/api.py inside the async function chat_completion would be nessary to stream out chunks exactly like how OpenAI does it.
We'd just have to emulate OpenAIs delta chunks to have native api support as well for other Applications being built that require streaming. Like voice chat bots etc.. etc..
{
"choices": [
{
"delta": {
"content": "1" <<< This is the letters/words.
},
"finish_reason": null,
"index": 0
}
],
"created": 1680380941,
"id": "chatcmpl-70c8LVUSYoSbdQTyONgJfcVU542wO",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion.chunk"
}
# ... lots more here ...
{
"choices": [
{
"delta": {
"content": "ina" <<< This is the letters/words.
},
"finish_reason": null,
"index": 0
}
],
"created": 1680380941,
"id": "chatcmpl-70c8LVUSYoSbdQTyONgJfcVU542wO",
"model": "gpt-3.5-turbo-0301",
"object": "chat.completion.chunk"
}
A modification in FastChat/fastchat/client/api.py some where in the ChatCompletionClient class and the ChatCompletion class as well. to have native streaming just like the OpenAI sdk.
async for chunk in await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo",
messages=[{
"role": "user",
"content": "Generate a list of 20 great names for sentient cheesecakes that teach SQL"
}],
stream=True,
):
content = chunk["choices"][0].get("delta", {}).get("content")
if content is not None:
print(content, end='')
This page really walks through streaming delta chunks from the OpenAI api.
https://til.simonwillison.net/gpt3/python-chatgpt-streaming-api
I think this is a good high level look at what changes are needed. If anyone else has any other ideas, feel free to pitch them in as well.
This does not look to difficult to do.
I need too
Hi, I've started to work on this issue.
I need too
The PR is ready and tested: https://github.com/lm-sys/FastChat/pull/858
Feel free to review!
@baradm100 where's your tip jar?
welp, i stupidly implemented this myself also in #873 without having checked other PRs first... lol
actually, looking at the two candidate PRs, i shouldn't call myself stupid. my version appears to make far fewer edits to achieve the same purpose.
i guess maintainers have some options now so this feature should see upstream shortly!
This is a demanding feature and thank all of you for your contributions!
I will try to merge #873 and #858
@merrymercy @baradm100
Hey, I was able to test this.
I was able to get delta token updates on my responses. However, there is some wiredness with the api working with langchain with Streaming turned on.
I get an empty response when streaming is turned on. Should a bug be entered here or on Langchain ?
import os
import streamlit as st
from langchain.chat_models import ChatOpenAI as OpenAIChat
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, SequentialChain
from langchain.memory import ConversationBufferMemory
from langchain.utilities import WikipediaAPIWrapper,BingSearchAPIWrapper
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate, LLMChain
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
AIMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.schema import (
AIMessage,
HumanMessage,
SystemMessage
)
os.environ['OPENAI_API_BASE'] = "http://localhost:8000/v1"
llm = OpenAIChat(openai_api_base="http://localhost:8000/v1",callbacks=[StreamingStdOutCallbackHandler()], model_name="vicuna-13B",streaming=False,verbose=True)
template="You are a helpful assistant that translates english to pirate."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
example_human = HumanMessagePromptTemplate.from_template("Hi")
example_ai = AIMessagePromptTemplate.from_template("Argh me mateys")
human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, example_human, example_ai, human_message_prompt])
chain = LLMChain(llm=llm, prompt=chat_prompt)
resp = chain.run("I love Red Hat!")
print(resp)
#### RESP
#### Aye, Red Hat be a fine operating system, arrr!
#### With Streaming
llm = OpenAIChat(openai_api_base="http://localhost:8000/v1",callbacks=[StreamingStdOutCallbackHandler()], model_name="vicuna-13B",streaming=False,verbose=True)
template="You are a helpful assistant that translates english to pirate."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
example_human = HumanMessagePromptTemplate.from_template("Hi")
example_ai = AIMessagePromptTemplate.from_template("Argh me mateys")
human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, example_human, example_ai, human_message_prompt])
chain = LLMChain(llm=llm, prompt=chat_prompt)
resp = chain.run("I love Red Hat!")
print(resp)
#### RESP
#### [Empty Line]