llama-stack Docker build can't find Ollama host on llama stack run

I've tried a few different ways to get this running and I'm not sure what I'm missing or if this is not working.

Running

   llama stack build --template local-ollama --name stack-test-docker --image-type docker
   llama stack configure llamastack-stack-test-docker

After configuring with port 4001

   llama stack run stack-test-docker --port 4001

router_api Api.inference router_api Api.safety router_api Api.memory Resolved 8 providers in topological order Api.models: routing_table Api.inference: router Api.shields: routing_table Api.safety: router Api.memory_banks: routing_table Api.memory: router Api.agents: meta-reference Api.telemetry: meta-reference

.... httpcore.ConnectError: All connection attempts failed

... RuntimeError: Ollama Server is not running, start it using ollama serve in a separate terminal

The thing is it is running and I have the models loaded and running. But this is on my host.

I've tried to set the host on the docker file with --add-host=host.docker.internal:host-gateway

  docker run --add-host=host.docker.internal:host-gateway -it -v ~/.llama/builds/docker/llama-stack-test-docker-run.yaml:/app/config.yaml -v ~/.llama:/root/.llama llamastack-llama-stack-test-docker python -m llama_stack.distribution.server.server --yaml_config /app/config.yaml --port 4001

I've also tried the conda build and get the same Ollama Server is not running error.

Is this a bug? or any insights from anyone would be great,

...or is this not meant to use the host machine local Ollama?

Sep 30 '24 16:09 clearstorm-tech

Wondering if you have the content of your llama-stack-test-docker-run.yaml? We should make sure that host/port should be pointed to the ollama endpoint for the ollama provider.

Sep 30 '24 17:09 yanxi0830

Sure, here it is.

  version: v1
  built_at: '2024-10-01T07:04:33.745095'
  image_name: stack-test-docker
  docker_image: stack-test-docker
  conda_env: null
  apis_to_serve:
  - models
  - memory_banks
  - agents
  - memory
  - shields
  - safety
  - inference
  api_providers:
    inference:
      providers:
      - remote::ollama
    memory:
      providers:
      - meta-reference
    safety:
      providers:
      - meta-reference
    agents:
      provider_id: meta-reference
      config:
        persistence_store:
          namespace: llama-stack
          type: redis
          host: localhost
          port: 6379
    telemetry:
      provider_id: meta-reference
      config: {}
  routing_table:
    inference:
    - provider_id: remote::ollama
      config:
        host: localhost
        port: 4001
      routing_key: Llama3.1-8B-Instruct
    memory:
    - provider_id: meta-reference
      config: {}
      routing_key: vector
    safety:
    - provider_id: meta-reference
      config:
        llama_guard_shield: null
        prompt_guard_shield: null
      routing_key:
      - llama_guard
      - code_scanner_guard
      - injection_shield
      - jailbreak_shield

Oct 01 '24 06:10 ghost

Could you try a different port than 4001 for starting the server e.g. via llama stack run stack-test-docker --port 5000. Your ollama is connecting to on localhost:4001.

Oct 04 '24 02:10 yanxi0830

ollama is typically running on port 11434 so that should be the port inside your run. I have now added this as a default -- see https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/adapters/inference/ollama/init.py#L11 so the next person should not trip on it.

specifically, this bit:

inference:
    - provider_id: remote::ollama
      config:
        host: localhost
        port: 4001
      routing_key: Llama3.1-8B-Instruct

should be:

inference:
    - provider_id: remote::ollama
      config:
        host: localhost
        port: 11434
      routing_key: Llama3.1-8B-Instruct

Oct 04 '24 03:10 ashwinb

Working. Thank you.

Oct 05 '24 09:10 ghost