mlc-llm
mlc-llm copied to clipboard
[Bug] Python server example runs but hangs on prefill function within api call
🐛 Bug
I managed to get the python server (located under mlc-llm/python) to work by first building both tvm and mlc-llm cli from source, and then running the command: python -m mlc_chat.rest --artifact-path ../dist --model vicuna-v1-7b --quantization q3f16_0 --device-name metal
. Which then successfully starts the server:
INFO: Will watch for changes in these directories: ['<some path>/mlc-llm/python']
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO: Started reloader process [26676] using StatReload
INFO: Started server process [26688]
INFO: Waiting for application startup.
[20:47:46] <some path>/tvm-unity-nightly/src/runtime/metal/metal_device_api.mm:165: Intializing Metal device 0, name=Apple M2 Pro
INFO: Application startup complete.
When I call the endpoint "/chat/completions" with the a correctly formatted post request, the function hangs on the session["chat_mod"].prefill(input=request.prompt)
call.
I tested the same code within the lifespan function (called on server startup) and everything works correctly there. I also checked that session["chat_mod"] was not empty within api function for "/chat/completions".
To Reproduce
Steps to reproduce the behavior:
- Run the python server
- Call the endpoint "/chat/completions" with the a correctly formatted post request
Expected behavior
Chat response returned from api call to the endpoint "/chat/completions".
Environment
- Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Metal
- Operating system (e.g. Ubuntu/Windows/MacOS/...): MacOS
- Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): M2 MacBook Pro
- How you installed MLC-LLM (
conda
, source): source - How you installed TVM-Unity (
pip
, source): source - Python version (e.g. 3.10): 3.11.3
CC: @sudeepag @Kathryn-cat
Hi @surya-ven, a couple of questions:
- Does running
python sample_client.py
work for you? Could you paste the output here? - Are you able to run the CLI successfully?
./build/mlc_chat_cli
Hi @surya-ven, a couple of questions:
- Does running
python sample_client.py
work for you? Could you paste the output here?- Are you able to run the CLI successfully?
./build/mlc_chat_cli
- It doesn't work, this is the output:
Supported models: {'data': [{'id': 'rwkv-raven-7b', 'object': 'model'}, {'id': 'RedPajama-INCITE-Chat-3B-v1', 'object': 'model'}, {'id': 'vicuna-v1-7b', 'object': 'model'}]}
Traceback (most recent call last):
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/http/client.py", line 1375, in getresponse
response.begin()
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/http/client.py", line 318, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/requests/adapters.py", line 487, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/http/client.py", line 1375, in getresponse
response.begin()
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/http/client.py", line 318, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "<path>/mlc-llm/python/mlc_chat/sample_client.py", line 15, in <module>
r = requests.post("http://127.0.0.1:8000/chat/completions", json=payload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/suryaven/miniconda3/envs/tvm-unity-build/lib/python3.11/site-packages/requests/adapters.py", line 502, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
- I'm able to run the CLI correctly. I've even managed to run it using python code adapted from some of the code in rest.py without the use of a server. It's only when I run it as a server, as intended in rest.py, that I encounter problems.
@surya-ven I'm not quite sure why this is happening. Could you try pulling the latest and maybe the steps here? Does the server output any logs when you submit the chat completion POST?
Closing due to the fix in #469. Please open a new issue if you are still running into this.
Thanks for looking into this, sorry I haven't had time to check, will try soon and open an issue/try find a fix if it still persists.