ipex-llm http://localhost:11434 cannot be accessed

export DOCKER_IMAGE=intelanalytics/ipex-llm-inference-cpp-xpu:latest export CONTAINER_NAME=intel-llm docker run -itd
--net=host
--device=/dev/dri
--privileged
-v /ollama/models:/models
-v /usr/lib/wsl:/usr/lib/wsl
-e no_proxy=localhost,127.0.0.1
-e OLLAMA_HOST=0.0.0.0
--memory="16G"
--name=$CONTAINER_NAME
-e DEVICE=Arc
--shm-size="16g"
$DOCKER_IMAGE

root@docker-desktop:/llm/scripts# curl http://localhost:11434 Ollama is running

root@docker:~# curl http://localhost:11434 curl: (7) Failed to connect to localhost port 11434 after 0 ms: Couldn't connect to server

The inside of the container can be accessed, but it cannot be accessed on the host.

Apr 20 '25 08:04 RICHES-2020

root@docker-desktop:/llm/scripts# bash start-ollama.sh root@docker-desktop:/llm/scripts# 2025/04/20 15:36:47 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:localhost,127.0.0.1]" time=2025-04-20T15:36:47.499+08:00 level=INFO source=images.go:432 msg="total blobs: 0" time=2025-04-20T15:36:47.499+08:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-04-20T15:36:47.500+08:00 level=INFO source=routes.go:1297 msg="Listening on [::]:11434 (version 0.0.0)" time=2025-04-20T15:36:47.501+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-04-20T15:36:50.098+08:00 level=INFO source=gpu.go:377 msg="no compatible GPUs were discovered" time=2025-04-20T15:36:50.098+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="15.4 GiB" available="14.0 GiB"

Apr 20 '25 09:04 RICHES-2020

root@docker-desktop:/llm/scripts# source ipex-llm-init --gpu --device $DEVICE found oneapi in /opt/intel/oneapi/setvars.sh

:: initializing oneAPI environment ... bash: BASH_VERSION = 5.1.16(1)-release args: Using "$@" for setvars.sh arguments: --force :: advisor -- latest :: ccl -- latest :: compiler -- latest :: dal -- latest :: debugger -- latest :: dev-utilities -- latest :: dnnl -- latest :: dpcpp-ct -- latest :: dpl -- latest :: ipp -- latest :: ippcp -- latest :: mkl -- latest :: mpi -- latest :: pti -- latest :: tbb -- latest :: umf -- latest :: vtune -- latest :: oneAPI environment initialized ::

/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /usr/local/lib/python3.11/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( /usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( +++++ Env Variables +++++ Internal: ENABLE_IOMP = 1 ENABLE_GPU = 1 ENABLE_JEMALLOC = 0 ENABLE_TCMALLOC = 0 LIB_DIR = /usr/local/lib BIN_DIR = bin64 LLM_DIR = /usr/local/lib/python3.11/dist-packages/ipex_llm

Exported: LD_PRELOAD = OMP_NUM_THREADS = MALLOC_CONF = USE_XETLA = OFF ENABLE_SDP_FUSION = SYCL_CACHE_PERSISTENT = 1 BIGDL_LLM_XMX_DISABLED = SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1 +++++++++++++++++++++++++

Apr 20 '25 09:04 RICHES-2020

no compatible GPUs were discovered, What's the problem?

Apr 20 '25 09:04 RICHES-2020

Hi @RICHES-2020, GPU devices will only be discovered after you loading a model. Try it via ollama run xxx.

Apr 21 '25 01:04 sgwhat

Hello there, try bridge mode and expose/define ports in docker to be able to connect from outside (i.e. host). Personally I never use host or privileged mode, if I don't really have to. It's a bad habit.

I believe the issue lies within your localhost/127.0.0.1 definition. It's not the same what most people assume when inside the container. I don't use that at all. I run a reverse proxy on my server.

Apr 21 '25 19:04 bermudahonk

I am using ollama-intel-2.3.0b20250806 from modelscope in Windows. It seems ollama-ipex ignores OLLAMA_HOST env. I tried 0.0.0.0, 0.0.0.0:11434, http://0.0.0.0:11434, and they are both not working.

Aug 14 '25 16:08 mr-cn