Docker build can't find Ollama host on llama stack run
I've tried a few different ways to get this running and I'm not sure what I'm missing or if this is not working.
Running
llama stack build --template local-ollama --name stack-test-docker --image-type docker
llama stack configure llamastack-stack-test-docker
After configuring with port 4001
llama stack run stack-test-docker --port 4001
router_api Api.inference router_api Api.safety router_api Api.memory Resolved 8 providers in topological order Api.models: routing_table Api.inference: router Api.shields: routing_table Api.safety: router Api.memory_banks: routing_table Api.memory: router Api.agents: meta-reference Api.telemetry: meta-reference
.... httpcore.ConnectError: All connection attempts failed
... RuntimeError: Ollama Server is not running, start it using ollama serve in a separate terminal
The thing is it is running and I have the models loaded and running. But this is on my host.
I've tried to set the host on the docker file with --add-host=host.docker.internal:host-gateway
docker run --add-host=host.docker.internal:host-gateway -it -v ~/.llama/builds/docker/llama-stack-test-docker-run.yaml:/app/config.yaml -v ~/.llama:/root/.llama llamastack-llama-stack-test-docker python -m llama_stack.distribution.server.server --yaml_config /app/config.yaml --port 4001
I've also tried the conda build and get the same Ollama Server is not running error.
Is this a bug? or any insights from anyone would be great,
...or is this not meant to use the host machine local Ollama?
Wondering if you have the content of your llama-stack-test-docker-run.yaml? We should make sure that host/port should be pointed to the ollama endpoint for the ollama provider.
Sure, here it is.
version: v1
built_at: '2024-10-01T07:04:33.745095'
image_name: stack-test-docker
docker_image: stack-test-docker
conda_env: null
apis_to_serve:
- models
- memory_banks
- agents
- memory
- shields
- safety
- inference
api_providers:
inference:
providers:
- remote::ollama
memory:
providers:
- meta-reference
safety:
providers:
- meta-reference
agents:
provider_id: meta-reference
config:
persistence_store:
namespace: llama-stack
type: redis
host: localhost
port: 6379
telemetry:
provider_id: meta-reference
config: {}
routing_table:
inference:
- provider_id: remote::ollama
config:
host: localhost
port: 4001
routing_key: Llama3.1-8B-Instruct
memory:
- provider_id: meta-reference
config: {}
routing_key: vector
safety:
- provider_id: meta-reference
config:
llama_guard_shield: null
prompt_guard_shield: null
routing_key:
- llama_guard
- code_scanner_guard
- injection_shield
- jailbreak_shield
Could you try a different port than 4001 for starting the server e.g. via llama stack run stack-test-docker --port 5000. Your ollama is connecting to on localhost:4001.
ollama is typically running on port 11434 so that should be the port inside your run. I have now added this as a default -- see https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/adapters/inference/ollama/init.py#L11 so the next person should not trip on it.
specifically, this bit:
inference:
- provider_id: remote::ollama
config:
host: localhost
port: 4001
routing_key: Llama3.1-8B-Instruct
should be:
inference:
- provider_id: remote::ollama
config:
host: localhost
port: 11434
routing_key: Llama3.1-8B-Instruct
Working. Thank you.