AsyncEngineDeadError with koboldai api server
Everything seems to work fine via the embedded klite interface, but when I pointed horde at it, it started throwing these:
It seems to kinda sorta maybe still serve horde requests?
INFO 01-16 12:30:08 async_aphrodite.py:133] Aborted request kai-ca722b2c86f04e9b88eed91ac6f5a65e.
INFO: 127.0.0.1:60750 - "POST /api/latest/generate HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 27, in _raise_exception_on_finish
task.result()
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 358, in run_engine_loop
has_requests_in_progress = await self.engine_step()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 337, in engine_step
request_outputs = await self.engine.step_async()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 188, in step_async
output = (await self._run_workers_async(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 225, in _run_workers_async
assert output == other_output
^^^^^^^^^^^^^^^^^^^^^^
AssertionError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/cors.py", line 83, in __call__
await self.app(scope, receive, send)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 762, in __call__
await self.middleware_stack(scope, receive, send)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 782, in app
await route.handle(scope, receive, send)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/routing.py", line 299, in app
raise e
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/routing.py", line 294, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/aphrodite-engine/aphrodite/endpoints/kobold/api_server.py", line 142, in generate
async for res in result_generator:
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 442, in generate
raise e
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 436, in generate
async for request_output in stream:
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 69, in __anext__
raise result
File "/workspace/micromamba/envs/aphrodite-runtime/lib/python3.11/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 36, in _raise_exception_on_finish
raise exc
File "/root/aphrodite-engine/aphrodite/engine/async_aphrodite.py", line 31, in _raise_exception_on_finish
raise AsyncEngineDeadError(
aphrodite.engine.async_aphrodite.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
Does it display another error when you kill the server with Ctrl + C?
No idea, but it still seemed somewhat functional. I sort of just killed the entire runpod after that.
The most likely cause for that error is a COOM error, so you may need to lower your number of threads.
I dunno, I tried again - this time instead of an fp16 with an AWQ 32g quant of mixtral (like 26gb on disk) on 2 A6000s (48GB vram each). I did, in a separate execution, on a separate server, push it far until it OOM'd and I clearly saw those CUDA OOM errors. I don't see any such messages in this case.
This time I kept it only to 1 thread in the horde client, I tried both gmu 0.98 and 0.8 - though I frankly have no idea how I should be tuning these values.
My cmd line: python -m aphrodite.endpoints.kobold.api_server --host 0.0.0.0 --served-model-name BagelMIsteryTour-v2-8x7B --model ~/ycros/BagelMIsteryTour-v2-8x7B-AWQ --max-length 1024 -tp 2 -gmu 0.8 --quantization awq --kv-cache-dtype fp8
I'm on a39eeb7188d8bc91a43712435b27ad9e4c2b98d1 running from source.
The failed requests as reported by horde are all these:
Something went wrong when processing request. Please check your trace.log file for the full stack trace. Payload: {'prompt': 'PROMPT REDACTED', 'n': 1, 'max_context_length': 2048, 'max_length': 64, 'rep_pen': 1.1, 'rep_pen_range': 1024,
'rep_pen_slope': 0.7, 'temperature': 0.9, 'tfs': 1.0, 'top_a': 0.0, 'top_k': 0, 'top_p': 0.9, 'typical': 1.0, 'sampler_order': [6, 0, 1, 2, 3, 4, 5], 'use_default_badwordsids': True, 'stop_sequence': [], 'min_p': 0.0, 'dynatemp_range': 0.0,
'dynatemp_exponent': 1.0, 'quiet': True, 'request_type': 'text2text', 'model': 'aphrodite/BagelMIsteryTour-v2-8x7B'}
When I stop it:
^CINFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [7657]
(RayWorker pid=9414) INFO 01-21 10:22:17 model_runner.py:459] Graph capturing finished in 35 secs.
(RayWorker pid=9414) [W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
root@d38248ce23ec:~#
Here's the log from the terminal as far as my tmux buffer went: aphro-log.txt
Does it log anywhere else I should be looking at before I shut this pod down? Is there anything else you'd like me to try to debug this? (I will probably shut the pod down in say, 12 hours)