Xinference Docker Compose Fail
System Info / 系統信息
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [X] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source / 从源码安装
Version info / 版本信息
0.13.2
The command used to start Xinference / 用以启动 xinference 的命令
docker-compose -f "xinference-updater/inference/xinference/deploy/xinf-dock-updater/docker-compose-distributed.yml" up -d --build
Reproduction / 复现过程
normal installation using docker-compose-distributed.yml
Expected behavior / 期待表现
Installation Successfully
Sorry, the docker-compose file is contributed by the community, we need some time to check if it can work.
@qinxuye Any update on the docker-compose file? really need that for uregnt deployment
@qinxuye Hey any update on the Docker-Compose Issue??
@qinxuye @frostyplanet @bufferoverflow Hey, any update on the same issue? its there for several weeks now
We will check next week, thanks for your patience.
@amumu96 will help to allocate the issue next week.
hi @amumu96 Can you please help regarding the same? need help since a long time
@amumu96 Hey, can you please help?
@qinxuye @amumu96 Any Update, really waiting to upgrade from v12.3 to v14 to use llama 3.1
I just tested locally and it works:
$ docker compose -f xinference/deploy/docker/docker-compose-distributed.yml up
[+] Running 4/4
✔ Container docker-xinference-supervisor-1 Created 0.3s
✔ Container docker-xinference-1 Created 0.3s
✔ Container docker-xinference-worker-2-1 Created 0.1s
✔ Container docker-xinference-worker-1-1 Created 0.1s
Attaching to xinference-1, xinference-supervisor-1, xinference-worker-1-1, xinference-worker-2-1
xinference-1 |
xinference-supervisor-1 |
xinference-1 | ==========
xinference-1 | == CUDA ==
xinference-1 | ==========
xinference-supervisor-1 | ==========
xinference-supervisor-1 | == CUDA ==
xinference-supervisor-1 | ==========
xinference-1 |
xinference-1 | CUDA Version 12.1.1
xinference-supervisor-1 |
xinference-supervisor-1 | CUDA Version 12.1.1
xinference-1 |
xinference-1 | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
xinference-supervisor-1 |
xinference-supervisor-1 | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
xinference-1 |
xinference-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
xinference-supervisor-1 |
xinference-supervisor-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
xinference-supervisor-1 | By pulling and using the container, you accept the terms and conditions of this license:
xinference-supervisor-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
xinference-supervisor-1 |
xinference-supervisor-1 | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
xinference-1 | By pulling and using the container, you accept the terms and conditions of this license:
xinference-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
xinference-1 |
xinference-1 | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
xinference-1 |
xinference-supervisor-1 |
xinference-1 exited with code 0
xinference-supervisor-1 | 2024-08-11 16:02:52,881 xinference.core.supervisor 45 INFO Xinference supervisor xinference-supervisor:9999 started
xinference-supervisor-1 | 2024-08-11 16:03:00,168 xinference.api.restful_api 1 INFO Starting Xinference at endpoint: http://xinference-supervisor:9997
xinference-worker-2-1 |
xinference-worker-2-1 | ==========
xinference-worker-2-1 | == CUDA ==
xinference-worker-2-1 | ==========
xinference-worker-2-1 |
xinference-worker-2-1 | CUDA Version 12.1.1
xinference-worker-2-1 |
xinference-worker-2-1 | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
xinference-worker-2-1 |
xinference-worker-2-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
xinference-worker-2-1 | By pulling and using the container, you accept the terms and conditions of this license:
xinference-worker-2-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
xinference-worker-2-1 |
xinference-worker-2-1 | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
xinference-worker-2-1 |
xinference-worker-1-1 |
xinference-worker-1-1 | ==========
xinference-worker-1-1 | == CUDA ==
xinference-worker-1-1 | ==========
xinference-worker-1-1 |
xinference-worker-1-1 | CUDA Version 12.1.1
xinference-worker-1-1 |
xinference-worker-1-1 | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
xinference-worker-1-1 |
xinference-worker-1-1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
xinference-worker-1-1 | By pulling and using the container, you accept the terms and conditions of this license:
xinference-worker-1-1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
xinference-worker-1-1 |
xinference-worker-1-1 | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
xinference-worker-1-1 |
xinference-worker-1-1 | 2024-08-11 16:03:06,755 xinference.core.worker 1 INFO Starting metrics export server at 0.0.0.0:None
xinference-worker-2-1 | 2024-08-11 16:03:06,755 xinference.core.worker 1 INFO Starting metrics export server at 0.0.0.0:None
xinference-worker-1-1 | 2024-08-11 16:03:06,756 xinference.core.worker 1 INFO Checking metrics export server...
xinference-worker-2-1 | 2024-08-11 16:03:06,756 xinference.core.worker 1 INFO Checking metrics export server...
xinference-worker-1-1 | 2024-08-11 16:03:09,053 xinference.core.worker 1 INFO Metrics server is started at: http://0.0.0.0:35193
xinference-worker-2-1 | 2024-08-11 16:03:09,056 xinference.core.worker 1 INFO Metrics server is started at: http://0.0.0.0:39179
xinference-worker-1-1 | 2024-08-11 16:03:09,067 xinference.core.worker 1 INFO Xinference worker xinference-worker-1:30001 started
xinference-worker-1-1 | 2024-08-11 16:03:09,068 xinference.core.worker 1 INFO Purge cache directory: /root/.xinference/cache
xinference-worker-2-1 | 2024-08-11 16:03:09,073 xinference.core.worker 1 INFO Xinference worker xinference-worker-2:30002 started
xinference-worker-2-1 | 2024-08-11 16:03:09,076 xinference.core.worker 1 INFO Purge cache directory: /root/.xinference/cache
I see you have xinf-dock-updater within the path so it's not https://github.com/xorbitsai/inference/blob/main/xinference/deploy/docker/docker-compose.yml which I'm using.
@bufferoverflow I am getting this issue when using the docker compose distributed file. just the name of the folder I had changed as xinf-dock-updater
Hmm, I mention the wrong file, of course I used https://github.com/xorbitsai/inference/blob/main/xinference/deploy/docker/docker-compose-distributed.yml . Please share the output of docker compose -f xinference/deploy/docker/docker-compose-distributed.yml up
@bufferoverflow @qinxuye @amumu96
Still getting the same issue:
No idea, maybe the docker environment you have is outdated or nvidia container toolkit is not installed. also the warning there could give a hint.
nvidia toolkit is also installed and its updated too. the warning there just specifies about the orphan containers, still it should create. @qinxuye Its already been more than over a week since this issue was meant to be resolved. Can you please expedite this?
@insistence-essenn It's a local setup problem, as you've seen based on my example it works. Does another docker-compose file work ?
@bufferoverflow Other Docker-compose file works perfectly, even xinference v0.12.3 runs perfectly, and the new version runs fine in a xinference-local, but cant get the supervisor worker set up. @qinxuye @amumu96 Can you please resolve this issue quickly? its already been a long time.
@qinxuye @amumu96 Will it ever get solved? been asking for so long but still no update🥲
@ChengjieLi28 any help?
@bufferoverflow @qinxuye @amumu96 @ChengjieLi28 I'm also facing the same issue as attached. I have tried doing the same in both my Linux Server using Docker and in my Windows machine using Docker. I used the docker-compose-distributed.yml file with the docker image xprobe/xinference:v0.14.3
version: '3.8'
services:
xinference: &xinference
image: xprobe/xinference:v0.14.3
# deploy:
# resources:
# reservations:
# devices:
# - capabilities: [gpu]
# driver: nvidia
# count: all
# volumes:
# # Replace <xinference_home> with your xinference home path on the host machine
# - <xinference_home>:/root/.xinference
# # Replace <huggingface_cache_dir> with your huggingface cache path, default is
# # <home_path>/.cache/huggingface
# - <huggingface_cache_dir>:/root/.cache/huggingface
# # If models are downloaded from modelscope, replace <huggingface_cache_dir> with
# # your modelscope cache path, default is <home_path>/.cache/modelscope
# - <modelscope_cache_dir>:/root/.cache/modelscope
# environment:
# # add envs here. Here's an example, if you want to download model from modelscope
# - XINFERENCE_MODEL_SRC=modelscope
xinference-supervisor:
<<: *xinference
ports:
- "9997:9997"
- "9999:9999"
command: xinference-supervisor --host xinference-supervisor --port 9997 --supervisor-port 9999
restart: always
healthcheck:
test: curl --fail http://xinference-supervisor:9997/status || exit 1
interval: 5s
retries: 5
start_period: 5s
timeout: 5s
# This examples is just using two workers. You can add more by incrementing
# the worker suffix and port number.
xinference-worker-1:
<<: *xinference
ports:
- "30001:30001"
command: xinference-worker -e http://xinference-supervisor:9997 --host xinference-worker-1 --worker-port 30001
restart: always
depends_on:
xinference-supervisor:
condition: service_healthy
xinference-worker-2:
<<: *xinference
ports:
- "30002:30002"
command: xinference-worker -e http://xinference-supervisor:9997 --host xinference-worker-2 --worker-port 30002
restart: always
depends_on:
xinference-supervisor:
condition: service_healthy
@bufferoverflow @saran-raj-18 I will update information here after I test this issue.
@insistence-essenn @saran-raj-18 Could you please try this Dockerfile:
FROM xprobe/xinference:v0.14.3
CMD ["/bin/bash"]
Build a simple docker image. And use this instead of xprobe/xinference:v0.14.3 in docker-compose-distributed.yml file to try agian. If this works, I will fix this for the next release.
@ChengjieLi28 It's working now with the above changes you mentioned. Thanks :)