langserve icon indicating copy to clipboard operation
langserve copied to clipboard

Exception in ASGI application - http 500 with request concurrence

Open JavierCCC opened this issue 1 year ago • 4 comments

Im having an issue with langchain receiving more than one request at once (send one request, send another before getting a response). My setup:

Im using VLLM as inference engine with ray (head only):

context = ray.init()
print(context.dashboard_url)
ray.nodes()

Loading mistral 7b with VLLM

llm = VLLM(
    model="TheBloke/Mistral-7B-Instruct-v0.2-AWQ",
    trust_remote_code=True,  # mandatory for hf models
    max_new_tokens=1000,
    top_k=1,
    top_p=0.95,
    #temperature=0.8,
    temperature=0.5,
    download_dir="./mistral-vllm",
    vllm_kwargs={"quantization":"awq",
                 "max_model_len":8000,
                "enforce_eager":True,}, # ??? Probarlo
    dtype="auto",
    tensor_parallel_size=1   # ray
    
)

Langchain chain served with langserve

from langchain.prompts import PromptTemplate
from langchain.callbacks.tracers import ConsoleCallbackHandler
from fastapi import FastAPI
from langserve import add_routes
#import asyncio
import uvicorn

template = """
[INST]<s>
Question: {question}

Given that question, write a short and accurate answer.

Answer:
[/INST]</s>
"""


prompt = PromptTemplate.from_template(template)
chain = prompt | llm

app = FastAPI(
  title="LangChain Server",
  version="1.0",
  description="A simple api server using Langchain's Runnable interfaces",
)

add_routes(
    app,
    chain,
    path="/chain",
)

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="1.1.1.1", port=9999)

Langserve starts fine. No problem if request are sent in sequence. This is the error i get when sending a new request without waiting for the first one to finish:


NFO: Application startup complete. INFO: Uvicorn running on http://172.23.0.2:9999 (Press CTRL+C to quit) Processed prompts: 0%| | 0/1 [00:00<?, ?it/sINFO: 172.23.0.2:32810 - "POST /chain/invoke HTTP/1.1" 500 Internal Server Error | 1/2 [00:11<00:11, 11.81s/it]

ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/routing.py", line 762, in call await self.middleware_stack(scope, receive, send) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/routing.py", line 782, in app await route.handle(scope, receive, send) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/starlette/routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/fastapi/routing.py", line 299, in app raise e File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/fastapi/routing.py", line 294, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langserve/server.py", line 446, in invoke return await api_handler.invoke(request) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langserve/api_handler.py", line 684, in invoke output = await self.runnable.ainvoke(input, config=config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2068, in ainvoke input = await step.ainvoke( ^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 252, in ainvoke llm_result = await self.agenerate_prompt( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 535, in agenerate_prompt return await self.agenerate( ^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 893, in agenerate output = await self._agenerate_helper( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 757, in _agenerate_helper raise e File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 741, in _agenerate_helper await self._agenerate( File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langchain_core/language_models/llms.py", line 490, in _agenerate return await run_in_executor( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 493, in run_in_executor return await asyncio.get_running_loop().run_in_executor( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/langchain_community/llms/vllm.py", line 132, in _generate outputs = self.client.generate(prompts, sampling_params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 165, in generate return self._run_engine(use_tqdm) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 185, in _run_engine step_outputs = self.llm_engine.step() ^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 642, in step return self._process_model_outputs(output, scheduler_outputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 597, in _process_model_outputs self._process_sequence_group_outputs(seq_group, outputs) File "/home/jovyan/venvs/javier/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 436, in _process_sequence_group_outputs parent_child_dict[sample.parent_seq_id].append(sample) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^ KeyError: 0 Processed prompts: 100%|█████████████████████████████████████████████████████████████████████████████████| 2/2 [00:13<00:00, 6.92s/it]INFO: 172.23.0.2:32812 - "POST /chain/invoke HTTP/1.1" 200 OK████████████████████████████████████████| 2/2 [00:13<00:00, 6.05s/it]Processed prompts: 0%| | 0/1 [00:46<?, ?it/s]


What im doing wrong?

JavierCCC avatar Jan 20 '24 00:01 JavierCCC