llama-stack
llama-stack copied to clipboard
Cannot connect to docker container on windows 11
I've got the models downloaded and my container starts:
docker run -it -p 5000:5000 -v C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu
But when I try to connect to my container I get no response:
$ curl http://localhost:5000/health
curl: (7) Failed to connect to localhost port 5000 after 2243 ms: Couldn't connect to server
Any idea why?
Also I cannot send hello world curl command:
$ python -m llama_stack.apis.inference.client localhost 5000
User>hello world, write me a 2 sentence poem about the moon
Traceback (most recent call last):
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 72, in map_httpcore_exceptions
yield
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 377, in handle_async_request
resp = await self._pool.handle_async_request(req)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection_pool.py", line 216, in handle_async_request
raise exc from None
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection_pool.py", line 196, in handle_async_request
response = await connection.handle_async_request(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection.py", line 99, in handle_async_request
raise exc
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection.py", line 76, in handle_async_request
stream = await self._connect(request)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_async\connection.py", line 122, in _connect
stream = await self._network_backend.connect_tcp(**kwargs)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_backends\auto.py", line 30, in connect_tcp
return await self._backend.connect_tcp(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_backends\anyio.py", line 115, in connect_tcp
with map_exceptions(exc_map):
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpcore\_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.ConnectError: All connection attempts failed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 198, in <module>
fire.Fire(main)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\fire\core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\fire\core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 194, in main
asyncio.run(run_main(host, port, stream, model, logprobs))
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 154, in run_main
async for log in EventLogger().log(iterator):
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\event_logger.py", line 32, in log
async for chunk in event_generator:
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\llama_stack\apis\inference\client.py", line 93, in _stream_chat_completion
async with client.stream(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\httpx\_client.py", line 1628, in stream
response = await self.send(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-packages\httpx\_client.py", line 1674, in send
response = await self._send_handling_auth(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_client.py", line 1702, in _send_handling_auth
response = await self._send_handling_redirects(
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_client.py", line 1739, in _send_handling_redirects
response = await self._send_single_request(request)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_client.py", line 1776, in _send_single_request
response = await transport.handle_async_request(request)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 376, in handle_async_request
with map_httpcore_exceptions():
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "C:\Users\sivar\miniconda3\envs\llama_stack\lib\site-
packages\httpx\_transports\default.py", line 89, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ConnectError: All connection attempts failed
or
$ curl -X POST http://localhost:5000/inference/chat_completion -H "Content-Type: application/json" -d '{"model": "Llama3.1-8B-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a 2 sentence poem about the stars."}], "sampling_params": {"temperature": 0.7, "max_tokens": 50}}'
curl: (7) Failed to connect to localhost port 5000 after 2257 ms: Couldn't connect to server