WrenAI Failed to deploy ollama model

I‘m using self-deployed ollama as LLM and Embedder provider, but failed to connect to ollama server. The log said An error occurred during question recommendation generation: litellm.APIError: APIError: OpenAIException - Connection error. Below is the full log and configuration files . Log:

wren-ai-service-1  | The above exception was the direct cause of the following exception:
wren-ai-service-1  |
wren-ai-service-1  | Traceback (most recent call last):
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.py", line 771, in acompletion
wren-ai-service-1  |     headers, response = await self.make_openai_chat_completion_request(
wren-ai-service-1  |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/logging_utils.py", line 131, in async_wrapper
wren-ai-service-1  |     result = await func(*args, **kwargs)
wren-ai-service-1  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.py", line 419, in make_openai_chat_completion_request
wren-ai-service-1  |     raise e
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.py", line 401, in make_openai_chat_completion_request
wren-ai-service-1  |     await openai_aclient.chat.completions.with_raw_response.create(
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/openai/_legacy_response.py", line 381, in wrapped
wren-ai-service-1  |     return cast(LegacyAPIResponse[R], await func(*args, **kwargs))
wren-ai-service-1  |                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/langfuse/openai.py", line 759, in _wrap_async
wren-ai-service-1  |     raise ex
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/langfuse/openai.py", line 715, in _wrap_async
wren-ai-service-1  |     openai_response = await wrapped(**arg_extractor.get_openai_args())
wren-ai-service-1  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 1727, in create
wren-ai-service-1  |     return await self._post(
wren-ai-service-1  |            ^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1849, in post
wren-ai-service-1  |     return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
wren-ai-service-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1543, in request
wren-ai-service-1  |     return await self._request(
wren-ai-service-1  |            ^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1606, in _request
wren-ai-service-1  |     return await self._retry_request(
wren-ai-service-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1676, in _retry_request
wren-ai-service-1  |     return await self._request(
wren-ai-service-1  |            ^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1606, in _request
wren-ai-service-1  |     return await self._retry_request(
wren-ai-service-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1676, in _retry_request
wren-ai-service-1  |     return await self._request(
wren-ai-service-1  |            ^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1616, in _request
wren-ai-service-1  |     raise APIConnectionError(request=request) from err
wren-ai-service-1  | openai.APIConnectionError: Connection error.
wren-ai-service-1  |
wren-ai-service-1  | During handling of the above exception, another exception occurred:
wren-ai-service-1  |
wren-ai-service-1  | Traceback (most recent call last):
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 463, in acompletion
wren-ai-service-1  |     response = await init_response
wren-ai-service-1  |                ^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/llms/openai/openai.py", line 817, in acompletion
wren-ai-service-1  |     raise OpenAIError(
wren-ai-service-1  | litellm.llms.openai.common_utils.OpenAIError: Connection error.
wren-ai-service-1  |
wren-ai-service-1  | During handling of the above exception, another exception occurred:
wren-ai-service-1  |
wren-ai-service-1  | Traceback (most recent call last):
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/hamilton/async_driver.py", line 122, in new_fn
wren-ai-service-1  |     await fn(**fn_kwargs) if asyncio.iscoroutinefunction(fn) else fn(**fn_kwargs)
wren-ai-service-1  |     ^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 219, in async_wrapper
wren-ai-service-1  |     self._handle_exception(observation, e)
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 517, in _handle_exception
wren-ai-service-1  |     raise e
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 217, in async_wrapper
wren-ai-service-1  |     result = await func(*args, **kwargs)
wren-ai-service-1  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/src/pipelines/generation/question_recommendation.py", line 48, in generate
wren-ai-service-1  |     return await generator(prompt=prompt.get("prompt"))
wren-ai-service-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/src/providers/llm/litellm.py", line 71, in _run
wren-ai-service-1  |     completion: Union[ModelResponse] = await acompletion(
wren-ai-service-1  |                                        ^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1358, in wrapper_async
wren-ai-service-1  |     raise e
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1217, in wrapper_async
wren-ai-service-1  |     result = await original_function(*args, **kwargs)
wren-ai-service-1  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 482, in acompletion
wren-ai-service-1  |     raise exception_type(
wren-ai-service-1  |           ^^^^^^^^^^^^^^^
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2190, in exception_type
wren-ai-service-1  |     raise e
wren-ai-service-1  |   File "/app/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 450, in exception_type
wren-ai-service-1  |     raise APIError(
wren-ai-service-1  | litellm.exceptions.APIError: litellm.APIError: APIError: OpenAIException - Connection error.
wren-ai-service-1  | -------------------------------------------------------------------
wren-ai-service-1  | Oh no an error! Need help with Hamilton?
wren-ai-service-1  | Join our slack and ask for help! https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g
wren-ai-service-1  | -------------------------------------------------------------------
wren-ai-service-1  |
wren-ai-service-1  | E0226 01:15:48.493 8 wren-ai-service:60] An error occurred during question recommendation generation: litellm.APIError: APIError: OpenAIException - Connection error.
wren-ai-service-1  | INFO:     172.21.0.6:52742 - "GET /v1/question-recommendations/925286b1-b6db-4115-bbf6-4b886cd9ae6d HTTP/1.1" 200 OK

.env

COMPOSE_PROJECT_NAME=wrenai
PLATFORM=linux/amd64

PROJECT_DIR=.

LLM_OLLAMA_API_KEY=123456
EMBEDDER_OLLAMA_API_KEY=123456
OPENAI_API_KEY=123456

# service port
WREN_ENGINE_PORT=8080
WREN_ENGINE_SQL_PORT=7432
WREN_AI_SERVICE_PORT=5555
WREN_UI_PORT=3000
IBIS_SERVER_PORT=8000
WREN_UI_ENDPOINT=http://wren-ui:${WREN_UI_PORT}

# ai service settings
QDRANT_HOST=qdrant
SHOULD_FORCE_DEPLOY=1

# version
# CHANGE THIS TO THE LATEST VERSION
WREN_PRODUCT_VERSION=0.15.3
WREN_ENGINE_VERSION=0.13.1
WREN_AI_SERVICE_VERSION=0.15.7
IBIS_SERVER_VERSION=0.13.1
WREN_UI_VERSION=0.20.1
WREN_BOOTSTRAP_VERSION=0.1.5

# user id (uuid v4)
USER_UUID=

# for other services
POSTHOG_API_KEY=phc_nhF32aj4xHXOZb0oqr2cn4Oy9uiWzz6CCP4KZmRq9aE
POSTHOG_HOST=https://app.posthog.com
TELEMETRY_ENABLED=true
# this is for telemetry to know the model, i think ai-service might be able to provide a endpoint to get the information
GENERATION_MODEL=gpt-4o-mini
LANGFUSE_SECRET_KEY=
LANGFUSE_PUBLIC_KEY=

# the port exposes to the host
# OPTIONAL: change the port if you have a conflict
HOST_PORT=3000
AI_SERVICE_FORWARD_PORT=5555

# Wren UI
EXPERIMENTAL_ENGINE_RUST_VERSION=false

config.yaml

type: llm
provider: litellm_llm
models:
# put OPENAI_API_KEY=<random_string> in ~/.wrenai/.env
- api_base: http://host.docker.internal:11434/v1  # change this to your ollama host, api_base should be <ollama_url>/v1
  api_key_name: LLM_OLLAMA_API_KEY
  model: openai/deepseek-r1:14b  # openai/<ollama_model_name>
  timeout: 600
  kwargs:
    n: 1
    temperature: 0

---
type: embedder
provider: litellm_embedder
models:
# put OPENAI_API_KEY=<random_string> in ~/.wrenai/.env
- model: openai/bge-m3:latest  # put your ollama embedder model name here, openai/<ollama_model_name>
  api_base: http://host.docker.internal:11434/v1  # change this to your ollama host, api_base should be <ollama_url>/v1
  api_key_name: EMBEDDER_OLLAMA_API_KEY
  timeout: 600

---
type: engine
provider: wren_ui
endpoint: http://wren-ui:3000

---
type: document_store
provider: qdrant
location: http://qdrant:6333
embedding_model_dim: 1024  # put your embedding model dimension here
timeout: 120
recreate_index: true

---
# please change the llm and embedder names to the ones you want to use
# the format of llm and embedder should be <provider>.<model_name> such as litellm_llm.gpt-4o-2024-08-06
# the pipes may be not the latest version, please refer to the latest version: https://raw.githubusercontent.com/canner/WrenAI/<WRENAI_VERSION_NUMBER>/docker/config.example.yaml
type: pipeline
pipes:
  - name: db_schema_indexing
    embedder: litellm_embedder.openai/bge-m3:latest
    document_store: qdrant
  - name: historical_question_indexing
    embedder: litellm_embedder.openai/bge-m3:latest
    document_store: qdrant
  - name: table_description_indexing
    embedder: litellm_embedder.openai/bge-m3:latest
    document_store: qdrant
  - name: db_schema_retrieval
    llm: litellm_llm.openai/deepseek-r1:14b
    embedder: litellm_embedder.openai/bge-m3:latest
    document_store: qdrant
  - name: historical_question_retrieval
    embedder: litellm_embedder.openai/bge-m3:latest
    document_store: qdrant
  - name: sql_generation
    llm: litellm_llm.openai/deepseek-r1:14b
    engine: wren_ui
  - name: sql_correction
    llm: litellm_llm.openai/deepseek-r1:14b
    engine: wren_ui
  - name: followup_sql_generation
    llm: litellm_llm.openai/deepseek-r1:14b
    engine: wren_ui
  - name: sql_summary
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: sql_answer
    llm: litellm_llm.openai/deepseek-r1:14b
    engine: wren_ui
  - name: sql_breakdown
    llm: litellm_llm.openai/deepseek-r1:14b
    engine: wren_ui
  - name: sql_expansion
    llm: litellm_llm.openai/deepseek-r1:14b
    engine: wren_ui
  - name: sql_explanation
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: semantics_description
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: relationship_recommendation
    llm: litellm_llm.openai/deepseek-r1:14b
    engine: wren_ui
  - name: question_recommendation
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: question_recommendation_db_schema_retrieval
    llm: litellm_llm.openai/deepseek-r1:14b
    embedder: litellm_embedder.openai/bge-m3:latest
    document_store: qdrant
  - name: question_recommendation_sql_generation
    llm: litellm_llm.openai/deepseek-r1:14b
    engine: wren_ui
  - name: chart_generation
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: chart_adjustment
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: intent_classification
    llm: litellm_llm.openai/deepseek-r1:14b
    embedder: litellm_embedder.openai/bge-m3:latest
    document_store: qdrant
  - name: data_assistance
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: sql_pairs_indexing
    document_store: qdrant
    embedder: litellm_embedder.openai/bge-m3:latest
  - name: sql_pairs_deletion
    document_store: qdrant
    embedder: litellm_embedder.openai/bge-m3:latest
  - name: sql_pairs_retrieval
    document_store: qdrant
    embedder: litellm_embedder.openai/bge-m3:latest
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: preprocess_sql_data
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: sql_executor
    engine: wren_ui
  - name: sql_question_generation
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: sql_generation_reasoning
    llm: litellm_llm.openai/deepseek-r1:14b
  - name: sql_regeneration
    llm: litellm_llm.openai/deepseek-r1:14b
    engine: wren_ui
  
---
settings:
  column_indexing_batch_size: 50
  table_retrieval_size: 10
  table_column_retrieval_size: 100
  allow_using_db_schemas_without_pruning: false  # if you want to use db schemas without pruning, set this to true. It will be faster
  query_cache_maxsize: 1000
  query_cache_ttl: 3600
  langfuse_host: https://cloud.langfuse.com
  langfuse_enable: true
  logging_level: DEBUG
  development: true

docker-compose.yaml

version: "3"

volumes:
  data:

networks:
  wren:
    driver: bridge

services:
  bootstrap:
    image: ghcr.io/canner/wren-bootstrap:${WREN_BOOTSTRAP_VERSION}
    restart: on-failure
    platform: ${PLATFORM}
    environment:
      DATA_PATH: /app/data
    volumes:
      - data:/app/data
    command: /bin/sh /app/init.sh

  wren-engine:
    image: ghcr.io/canner/wren-engine:${WREN_ENGINE_VERSION}
    restart: on-failure
    platform: ${PLATFORM}
    expose:
      - ${WREN_ENGINE_PORT}
      - ${WREN_ENGINE_SQL_PORT}
    volumes:
      - data:/usr/src/app/etc
      - ${PROJECT_DIR}/data:/usr/src/app/data
    networks:
      - wren
    depends_on:
      - bootstrap

  ibis-server:
    image: ghcr.io/canner/wren-engine-ibis:${IBIS_SERVER_VERSION}
    restart: on-failure
    platform: ${PLATFORM}
    expose:
      - ${IBIS_SERVER_PORT}
    environment:
      WREN_ENGINE_ENDPOINT: http://wren-engine:${WREN_ENGINE_PORT}
    networks:
      - wren

  wren-ai-service:
    image: ghcr.io/canner/wren-ai-service:${WREN_AI_SERVICE_VERSION}
    restart: on-failure
    platform: ${PLATFORM}
    expose:
      - ${WREN_AI_SERVICE_PORT}
    ports:
      - ${AI_SERVICE_FORWARD_PORT}:${WREN_AI_SERVICE_PORT}
    environment:
      # sometimes the console won't show print messages,
      # using PYTHONUNBUFFERED: 1 can fix this
      PYTHONUNBUFFERED: 1
      CONFIG_PATH: /app/data/config.yaml
    env_file:
      - ${PROJECT_DIR}/.env
    volumes:
      - ${PROJECT_DIR}/config.yaml:/app/data/config.yaml
    networks:
      - wren
    depends_on:
      - qdrant

  qdrant:
    image: qdrant/qdrant:v1.13.2
    restart: on-failure
    expose:
      - 6333
      - 6334
    volumes:
      - data:/qdrant/storage
    networks:
      - wren

  wren-ui:
    image: ghcr.io/canner/wren-ui:${WREN_UI_VERSION}
    restart: on-failure
    platform: ${PLATFORM}
    environment:
      DB_TYPE: sqlite
      # /app is the working directory in the container
      SQLITE_FILE: /app/data/db.sqlite3
      WREN_ENGINE_ENDPOINT: http://wren-engine:${WREN_ENGINE_PORT}
      WREN_AI_ENDPOINT: http://wren-ai-service:${WREN_AI_SERVICE_PORT}
      IBIS_SERVER_ENDPOINT: http://ibis-server:${IBIS_SERVER_PORT}
      # this is for telemetry to know the model, i think ai-service might be able to provide a endpoint to get the information
      GENERATION_MODEL: ${GENERATION_MODEL}
      # telemetry
      WREN_ENGINE_PORT: ${WREN_ENGINE_PORT}
      WREN_AI_SERVICE_VERSION: ${WREN_AI_SERVICE_VERSION}
      WREN_UI_VERSION: ${WREN_UI_VERSION}
      WREN_ENGINE_VERSION: ${WREN_ENGINE_VERSION}
      USER_UUID: ${USER_UUID}
      POSTHOG_API_KEY: ${POSTHOG_API_KEY}
      POSTHOG_HOST: ${POSTHOG_HOST}
      TELEMETRY_ENABLED: ${TELEMETRY_ENABLED}
      # client side
      NEXT_PUBLIC_USER_UUID: ${USER_UUID}
      NEXT_PUBLIC_POSTHOG_API_KEY: ${POSTHOG_API_KEY}
      NEXT_PUBLIC_POSTHOG_HOST: ${POSTHOG_HOST}
      NEXT_PUBLIC_TELEMETRY_ENABLED: ${TELEMETRY_ENABLED}
      EXPERIMENTAL_ENGINE_RUST_VERSION: ${EXPERIMENTAL_ENGINE_RUST_VERSION}
      # configs
      WREN_PRODUCT_VERSION: ${WREN_PRODUCT_VERSION}
    ports:
      # HOST_PORT is the port you want to expose to the host machine
      - ${HOST_PORT}:3000
    volumes:
      - data:/app/data
    networks:
      - wren
    depends_on:
      - wren-ai-service
      - wren-engine

Feb 26 '25 01:02 Archilht

你docker里面有部署模型么？
如果是 ollama 部署 :

set OLLAMA_HOST="0.0.0.0"
ollama pull deepseek-r1:14b
ollama pull bge-m3:latest
ollama serve

config 中把 api_base: http://host.docker.internal:11434/v1 替换为 api_base: http://192.168.1.x:11434/v1

Feb 26 '25 06:02 jos666

Hi @Archilht, did you solve this issue? If not, you might be able to refer to my example for Ollama config for llm and embedder section.

models:
- api_base: http://host.docker.internal:11434/
  kwargs:
    n: 1
    temperature: 0
  model: ollama/phi4
provider: litellm_llm
timeout: 120
type: llm
---
models:
- api_base: http://host.docker.internal:11434/
  model: ollama/nomic-embed-text:latest
  timeout: 120
provider: litellm_embedder
type: embedder

Mar 03 '25 11:03 paopa

Hi @Archilht, did you solve this issue? If not, you might be able to refer to my example for Ollama config for llm and embedder section.

models:

api_base: http://host.docker.internal:11434/ kwargs: n: 1 temperature: 0 model: ollama/phi4 provider: litellm_llm timeout: 120 type: llm

models:

api_base: http://host.docker.internal:11434/ model: ollama/nomic-embed-text:latest timeout: 120 provider: litellm_embedder type: embedder

Thanks for the response, it seems that the error is caused by "host.docker.internal", missed out the internet settings in docker-compose.yaml

Mar 07 '25 07:03 Archilht

@Archilht excuse me, what is your os?, you could also check this document for reference: https://docs.getwren.ai/oss/ai_service/guide/custom_llm#launch-wren-ai

Mar 13 '25 07:03 cyyeh