ChatGLM-6B Add Stream API deployment

This script implements the streaming transmission of model response results, eliminating the need for users to wait for a complete response of the content. When accessing the interface, it will return an 'event-stream' stream.

Apr 13 '23 04:04 Vinlic

This PR uses the same SSE transmission as ChatGPT, which is a way for servers to push data to clients and has higher performance advantages compared to synchronous response and WebSocket solutions. demo

Apr 14 '23 01:04 Vinlic

@Vinlic 感谢你提供的stream api方案。我这边的测试发现，server端报这个错（见下边server端报错情况），但不影响程序的运行。client 端用requests是报这个错：requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))。不知道你这边有遇到过吗？ server端报错：RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Apr 19 '23 03:04 ninghongbo123

@ALL From GPT-4

这个错误表明在尝试使用不可用的CUDA设备。RuntimeError: CUDA error: invalid device ordinal 是因为尝试使用无法访问的GPU设备。可能的原因是设备ID设置错误或系统上没有足够的GPU设备。你可以通过检查DEVICE_ID的值来解决这个问题，确保它指向一个可用的GPU设备。

首先，请检查您的系统上可用的GPU设备数量。在命令行中运行以下命令：

nvidia-smi

这将显示您的系统上的GPU设备以及其他相关信息。请确保您选择的设备ID（在脚本中为DEVICE_ID）在可用设备范围内。

如果您只有一个GPU设备，将DEVICE_ID设置为"0"，如下所示：

DEVICE_ID = "0"

对于客户端requests.exceptions.ChunkedEncodingError错误，这是因为requests库不支持处理服务器发送事件（SSE）响应。您需要使用另一个库，如httpx或aiohttp，它们支持异步请求和处理SSE响应。

例如，您可以使用httpx库。首先，安装httpx：

pip install httpx

然后，您可以使用以下代码来接收服务器发送的事件：

import httpx

url = "http://127.0.0.1:8010"
data = {
    "input": "你好ChatGLM",
    "max_length": 2048,
    "top_p": 0.7,
    "temperature": 0.95,
    "history": [],
    "html_entities": True,
}

async with httpx.AsyncClient() as client:
    async with client.stream("POST", url, json=data) as response:
        async for line in response.aiter_lines():
            print(line)

这应该解决您在客户端遇到的问题。

Apr 23 '23 10:04 yhyu13

我刚提了个PR，https://github.com/THUDM/ChatGLM-6B/pull/808 ，比我实现的好，不过你的代码里没有把history返回；重新整理一下呗，完善一下，我看SSE按用你的代码挺好；

Apr 25 '23 06:04 liseri

你好，有请求的示例吗？你这个请求页面可以发一下吗？感谢！！！

Apr 26 '23 08:04 sportzhang

你好，有请求的示例吗？你这个请求页面可以发一下吗？感谢！！！

@Vinlic 的工作很棒通过使用经验，我试着补充这一部分的示例和使用说明。

API部署首先需要安装额外的依赖 pip install sse_starlette，然后运行仓库中的 stream_api.py：

python stream_api.py

默认部署在本地的 8010 端口，通过 POST 方法进行调用

curl -X POST "http://127.0.0.1:8010" \
     -H 'Content-Type: application/json' \
     -d '{"input": "你好"}'

得到的返回值为

stream context

May 21 '23 15:05 llxxxll

请问如果用requests.post()的形式，该如何请求呢？

Aug 08 '23 22:08 Shawn4742

ChatGLM-6B ChatGLM-6B copied to clipboard

Add Stream API deployment

ChatGLM-6B
ChatGLM-6B copied to clipboard