Langchain-Chatchat
Langchain-Chatchat copied to clipboard
请大家重视这个issue!真正使用肯定是多用户并发问答,希望增加此功能!!!
这得看你有多少显卡
+1
这得看你有多少显卡
是不是一个显卡无论多少G,单卡情况下同一时间只支持调用一个模型? 没办法通过多进程并行调用模型推理?
我4张卡,也并行不了。还是串行的
fschat已经做到了模型的并发,我们框架是在fschat上进行开发的,因此是支持模型并发的,可以使用最新代码体验一下
我也现在也遇到了这样的问题,经过多用户压测,发现性能卡在了程序这一层 虚拟用户20 GPU A100 80G CPU 64C 160G
随机向FastAPI http://127.0.0.1:7861/chat/knowledge_base_chat 知识库提问
20 虚拟用户
scenarios: (100.00%) 1 scenario, 20 max VUs, 2m30s max duration (incl. graceful stop):
* breaking: 20 looping VUs for 2m0s (gracefulStop: 30s)
WARN[0060] Request Failed error="request timeout"
WARN[0060] Request Failed error="request timeout"
WARN[0060] Request Failed error="request timeout"
WARN[0060] Request Failed error="request timeout"
WARN[0060] Request Failed error="request timeout"
WARN[0060] Request Failed error="request timeout"
WARN[0060] Request Failed error="request timeout"
WARN[0060] Request Failed error="request timeout"
WARN[0060] Request Failed error="request timeout"
WARN[0073] Request Failed error="request timeout"
WARN[0075] Request Failed error="request timeout"
WARN[0075] Request Failed error="request timeout"
WARN[0084] Request Failed error="request timeout"
WARN[0086] Request Failed error="request timeout"
WARN[0092] Request Failed error="request timeout"
WARN[0093] Request Failed error="request timeout"
WARN[0113] Request Failed error="request timeout"
WARN[0114] Request Failed error="request timeout"
WARN[0115] Request Failed error="request timeout"
WARN[0120] Request Failed error="request timeout"
WARN[0120] Request Failed error="request timeout"
WARN[0120] Request Failed error="request timeout"
WARN[0120] Request Failed error="request timeout"
WARN[0120] Request Failed error="request timeout"
WARN[0120] Request Failed error="request timeout"
WARN[0120] Request Failed error="request timeout"
WARN[0120] Request Failed error="request timeout"
WARN[0120] Request Failed error="request timeout"
WARN[0133] Request Failed error="request timeout"
WARN[0135] Request Failed error="request timeout"
WARN[0135] Request Failed error="request timeout"
WARN[0144] Request Failed error="request timeout"
WARN[0146] Request Failed error="request timeout"
✗ response code was 200
↳ 26% — ✓ 12 / ✗ 33
checks.........................: 26.66% ✓ 12 ✗ 33
data_received..................: 65 kB 434 B/s
data_sent......................: 30 kB 200 B/s
http_req_blocked...............: avg=572.98µs min=2.14µs med=631.43µs max=1.13ms p(90)=1.01ms p(95)=1.06ms
http_req_connecting............: avg=519.36µs min=0s med=558.45µs max=1.07ms p(90)=952.76µs p(95)=986.85µs
✗ http_req_duration..............: avg=53.5s min=13.24s med=59.99s max=1m0s p(90)=1m0s p(95)=1m0s
{ expected_response:true }...: avg=35.65s min=13.24s med=32.92s max=54.81s p(90)=54.63s p(95)=54.74s
✗ http_req_failed................: 73.33% ✓ 33 ✗ 12
http_req_receiving.............: avg=53s min=12.91s med=59.41s max=59.98s p(90)=59.98s p(95)=59.98s
http_req_sending...............: avg=62.64µs min=19.34µs med=44.73µs max=196.91µs p(90)=130.25µs p(95)=162.7µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=500.18ms min=16.52ms med=327.3ms max=1.66s p(90)=1.34s p(95)=1.51s
http_reqs......................: 45 0.299999/s
iteration_duration.............: avg=53.5s min=13.24s med=1m0s max=1m0s p(90)=1m0s p(95)=1m0s
iterations.....................: 45 0.299999/s
vus............................: 6 min=6 max=20
vus_max........................: 20 min=20 max=20
running (2m30.0s), 00/20 VUs, 45 complete and 6 interrupted iterations
breaking ✓ [======================================] 20 VUs 2m0s
ERRO[0151] thresholds on metrics 'http_req_duration, http_req_failed' have been crossed
20 用户服务端异常信息
2023-11-09 19:40:44 | ERROR | asyncio | Task was destroyed but it is pending!
task: <Task pending name='Task-55435' coro=<Queue.get() running at /home/cloud/miniconda3/envs/llm-poc/lib/python3.10/asyncio/queues.py:159> wait_for=<Future pending cb=[Task.task_wakeup()]>>
2023-11-09 19:42:30 | ERROR | stderr | ERROR: Exception in ASGI application
2023-11-09 19:42:30 | ERROR | stderr | Traceback (most recent call last):
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 34, in read
2023-11-09 19:42:30 | ERROR | stderr | return await self._stream.receive(max_bytes=max_bytes)
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1203, in receive
2023-11-09 19:42:30 | ERROR | stderr | await self._protocol.read_event.wait()
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/asyncio/locks.py", line 213, in wait
2023-11-09 19:42:30 | ERROR | stderr | await fut
2023-11-09 19:42:30 | ERROR | stderr | asyncio.exceptions.CancelledError
2023-11-09 19:42:30 | ERROR | stderr |
2023-11-09 19:42:30 | ERROR | stderr | During handling of the above exception, another exception occurred:
2023-11-09 19:42:30 | ERROR | stderr |
2023-11-09 19:42:30 | ERROR | stderr | Traceback (most recent call last):
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
2023-11-09 19:42:30 | ERROR | stderr | yield
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 32, in read
2023-11-09 19:42:30 | ERROR | stderr | with anyio.fail_after(timeout):
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/anyio/_core/_tasks.py", line 119, in __exit__
2023-11-09 19:42:30 | ERROR | stderr | raise TimeoutError
2023-11-09 19:42:30 | ERROR | stderr | TimeoutError
2023-11-09 19:42:30 | ERROR | stderr |
2023-11-09 19:42:30 | ERROR | stderr | The above exception was the direct cause of the following exception:
2023-11-09 19:42:30 | ERROR | stderr |
2023-11-09 19:42:30 | ERROR | stderr | Traceback (most recent call last):
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
2023-11-09 19:42:30 | ERROR | stderr | yield
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
2023-11-09 19:42:30 | ERROR | stderr | resp = await self._pool.handle_async_request(req)
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 262, in handle_async_request
2023-11-09 19:42:30 | ERROR | stderr | raise exc
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 245, in handle_async_request
2023-11-09 19:42:30 | ERROR | stderr | response = await connection.handle_async_request(request)
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_async/connection.py", line 96, in handle_async_request
2023-11-09 19:42:30 | ERROR | stderr | return await self._connection.handle_async_request(request)
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_async/http11.py", line 121, in handle_async_request
2023-11-09 19:42:30 | ERROR | stderr | raise exc
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_async/http11.py", line 99, in handle_async_request
2023-11-09 19:42:30 | ERROR | stderr | ) = await self._receive_response_headers(**kwargs)
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_async/http11.py", line 164, in _receive_response_headers
2023-11-09 19:42:30 | ERROR | stderr | event = await self._receive_event(timeout=timeout)
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_async/http11.py", line 200, in _receive_event
2023-11-09 19:42:30 | ERROR | stderr | data = await self._network_stream.read(
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 31, in read
2023-11-09 19:42:30 | ERROR | stderr | with map_exceptions(exc_map):
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/contextlib.py", line 153, in __exit__
2023-11-09 19:42:30 | ERROR | stderr | self.gen.throw(typ, value, traceback)
2023-11-09 19:42:30 | ERROR | stderr | File "/home/cloud/miniconda3/envs/llm-poc/lib/python3.10/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
2023-11-09 19:42:30 | ERROR | stderr | raise to_exc(exc) from exc
2023-11-09 19:42:30 | ERROR | stderr | httpcore.ReadTimeout
文件位置:~\miniconda3\Lib\site-packages\langchain\callbacks\streaming_aiter.py
from typing import Any, AsyncIterator, Dict, List, Literal, Union, cast
from langchain.callbacks.base import AsyncCallbackHandler
from langchain.schema.output import LLMResult
# TODO If used by two LLM runs in parallel this won't work as expected
class AsyncIteratorCallbackHandler(AsyncCallbackHandler):
"""Callback handler that returns an async iterator."""
queue: asyncio.Queue[str]
done: asyncio.Event
@property
def always_verbose(self) -> bool:
return True
def __init__(self) -> None:
self.queue = asyncio.Queue()
self.done = asyncio.Event()
目前代码只排查到这一层,现在不确定是Langchain的问题,还是FastChat的问题,应该如何去调整,目前没有调整的思路,还请各位高手多多指教。
出现同样的问题,第一人多的时候会出现,第二模型运行一段时间之后突然也会出现这种报错,
文件位置:~\miniconda3\Lib\site-packages\langchain\callbacks\streaming_aiter.py
from typing import Any, AsyncIterator, Dict, List, Literal, Union, cast from langchain.callbacks.base import AsyncCallbackHandler from langchain.schema.output import LLMResult # TODO If used by two LLM runs in parallel this won't work as expected class AsyncIteratorCallbackHandler(AsyncCallbackHandler): """Callback handler that returns an async iterator.""" queue: asyncio.Queue[str] done: asyncio.Event @property def always_verbose(self) -> bool: return True def __init__(self) -> None: self.queue = asyncio.Queue() self.done = asyncio.Event()
目前代码只排查到这一层,现在不确定是Langchain的问题,还是FastChat的问题,应该如何去调整,目前没有调整的思路,还请各位高手多多指教。
看到别的issuehttps://github.com/chatchat-space/Langchain-Chatchat/issues/2684里说10个人左右就是上限,请问您后来有能解决的方案吗,如果我对并发要求不高的话,是不是可以考虑延迟api请求,顺序执行。
出现同样的问题,第一人多的时候会出现,第二模型运行一段时间之后突然也会出现这种报错,
请问你们如何处理的这个问题呀,现在解决了吗
文件位置:~\miniconda3\Lib\site-packages\langchain\callbacks\streaming_aiter.py
from typing import Any, AsyncIterator, Dict, List, Literal, Union, cast from langchain.callbacks.base import AsyncCallbackHandler from langchain.schema.output import LLMResult # TODO If used by two LLM runs in parallel this won't work as expected class AsyncIteratorCallbackHandler(AsyncCallbackHandler): """Callback handler that returns an async iterator.""" queue: asyncio.Queue[str] done: asyncio.Event @property def always_verbose(self) -> bool: return True def __init__(self) -> None: self.queue = asyncio.Queue() self.done = asyncio.Event()
目前代码只排查到这一层,现在不确定是Langchain的问题,还是FastChat的问题,应该如何去调整,目前没有调整的思路,还请各位高手多多指教。
请问最后是怎么处理的呀