inference Xinference Docker Compose Fail

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

[X] docker / docker
[ ] pip install / 通过 pip install 安装
[ ] installation from source / 从源码安装

Version info / 版本信息

0.13.2

The command used to start Xinference / 用以启动 xinference 的命令

docker-compose -f "xinference-updater/inference/xinference/deploy/xinf-dock-updater/docker-compose-distributed.yml" up -d --build

Reproduction / 复现过程

normal installation using docker-compose-distributed.yml

Expected behavior / 期待表现

Installation Successfully

Jul 22 '24 06:07 insistence-essenn

Sorry, the docker-compose file is contributed by the community, we need some time to check if it can work.

Jul 23 '24 11:07 qinxuye

@qinxuye Any update on the docker-compose file? really need that for uregnt deployment

Jul 25 '24 05:07 insistence-essenn

@qinxuye Hey any update on the Docker-Compose Issue??

Jul 31 '24 04:07 insistence-essenn

@qinxuye @frostyplanet @bufferoverflow Hey, any update on the same issue? its there for several weeks now

Aug 01 '24 06:08 insistence-essenn

We will check next week, thanks for your patience.

Aug 01 '24 07:08 qinxuye

@amumu96 will help to allocate the issue next week.

Aug 01 '24 07:08 qinxuye

hi @amumu96 Can you please help regarding the same? need help since a long time

Aug 05 '24 06:08 insistence-essenn

@amumu96 Hey, can you please help?

Aug 06 '24 06:08 insistence-essenn

@qinxuye @amumu96 Any Update, really waiting to upgrade from v12.3 to v14 to use llama 3.1

Aug 07 '24 05:08 insistence-essenn

I just tested locally and it works:

$ docker compose  -f xinference/deploy/docker/docker-compose-distributed.yml  up
[+] Running 4/4
 ✔ Container docker-xinference-supervisor-1  Created                                                                                                                     0.3s 
 ✔ Container docker-xinference-1             Created                                                                                                                     0.3s 
 ✔ Container docker-xinference-worker-2-1    Created                                                                                                                     0.1s 
 ✔ Container docker-xinference-worker-1-1    Created                                                                                                                     0.1s 
Attaching to xinference-1, xinference-supervisor-1, xinference-worker-1-1, xinference-worker-2-1
xinference-1             | 
xinference-supervisor-1  | 
xinference-1             | ==========
xinference-1             | == CUDA ==
xinference-1             | ==========
xinference-supervisor-1  | ==========
xinference-supervisor-1  | == CUDA ==
xinference-supervisor-1  | ==========
xinference-1             | 
xinference-1             | CUDA Version 12.1.1
xinference-supervisor-1  | 
xinference-supervisor-1  | CUDA Version 12.1.1
xinference-1             | 
xinference-1             | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
xinference-supervisor-1  | 
xinference-supervisor-1  | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
xinference-1             | 
xinference-1             | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
xinference-supervisor-1  | 
xinference-supervisor-1  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
xinference-supervisor-1  | By pulling and using the container, you accept the terms and conditions of this license:
xinference-supervisor-1  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
xinference-supervisor-1  | 
xinference-supervisor-1  | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
xinference-1             | By pulling and using the container, you accept the terms and conditions of this license:
xinference-1             | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
xinference-1             | 
xinference-1             | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
xinference-1             | 
xinference-supervisor-1  | 
xinference-1 exited with code 0
xinference-supervisor-1  | 2024-08-11 16:02:52,881 xinference.core.supervisor 45 INFO     Xinference supervisor xinference-supervisor:9999 started
xinference-supervisor-1  | 2024-08-11 16:03:00,168 xinference.api.restful_api 1 INFO     Starting Xinference at endpoint: http://xinference-supervisor:9997
xinference-worker-2-1    | 
xinference-worker-2-1    | ==========
xinference-worker-2-1    | == CUDA ==
xinference-worker-2-1    | ==========
xinference-worker-2-1    | 
xinference-worker-2-1    | CUDA Version 12.1.1
xinference-worker-2-1    | 
xinference-worker-2-1    | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
xinference-worker-2-1    | 
xinference-worker-2-1    | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
xinference-worker-2-1    | By pulling and using the container, you accept the terms and conditions of this license:
xinference-worker-2-1    | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
xinference-worker-2-1    | 
xinference-worker-2-1    | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
xinference-worker-2-1    | 
xinference-worker-1-1    | 
xinference-worker-1-1    | ==========
xinference-worker-1-1    | == CUDA ==
xinference-worker-1-1    | ==========
xinference-worker-1-1    | 
xinference-worker-1-1    | CUDA Version 12.1.1
xinference-worker-1-1    | 
xinference-worker-1-1    | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
xinference-worker-1-1    | 
xinference-worker-1-1    | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
xinference-worker-1-1    | By pulling and using the container, you accept the terms and conditions of this license:
xinference-worker-1-1    | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
xinference-worker-1-1    | 
xinference-worker-1-1    | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
xinference-worker-1-1    | 
xinference-worker-1-1    | 2024-08-11 16:03:06,755 xinference.core.worker 1 INFO     Starting metrics export server at 0.0.0.0:None
xinference-worker-2-1    | 2024-08-11 16:03:06,755 xinference.core.worker 1 INFO     Starting metrics export server at 0.0.0.0:None
xinference-worker-1-1    | 2024-08-11 16:03:06,756 xinference.core.worker 1 INFO     Checking metrics export server...
xinference-worker-2-1    | 2024-08-11 16:03:06,756 xinference.core.worker 1 INFO     Checking metrics export server...
xinference-worker-1-1    | 2024-08-11 16:03:09,053 xinference.core.worker 1 INFO     Metrics server is started at: http://0.0.0.0:35193
xinference-worker-2-1    | 2024-08-11 16:03:09,056 xinference.core.worker 1 INFO     Metrics server is started at: http://0.0.0.0:39179
xinference-worker-1-1    | 2024-08-11 16:03:09,067 xinference.core.worker 1 INFO     Xinference worker xinference-worker-1:30001 started
xinference-worker-1-1    | 2024-08-11 16:03:09,068 xinference.core.worker 1 INFO     Purge cache directory: /root/.xinference/cache
xinference-worker-2-1    | 2024-08-11 16:03:09,073 xinference.core.worker 1 INFO     Xinference worker xinference-worker-2:30002 started
xinference-worker-2-1    | 2024-08-11 16:03:09,076 xinference.core.worker 1 INFO     Purge cache directory: /root/.xinference/cache

I see you have xinf-dock-updater within the path so it's not https://github.com/xorbitsai/inference/blob/main/xinference/deploy/docker/docker-compose.yml which I'm using.

Aug 11 '24 16:08 bufferoverflow

@bufferoverflow I am getting this issue when using the docker compose distributed file. just the name of the folder I had changed as xinf-dock-updater

Aug 11 '24 16:08 insistence-essenn

Hmm, I mention the wrong file, of course I used https://github.com/xorbitsai/inference/blob/main/xinference/deploy/docker/docker-compose-distributed.yml . Please share the output of docker compose -f xinference/deploy/docker/docker-compose-distributed.yml up

Aug 11 '24 16:08 bufferoverflow

@bufferoverflow @qinxuye @amumu96 Still getting the same issue:

Aug 12 '24 05:08 insistence-essenn

No idea, maybe the docker environment you have is outdated or nvidia container toolkit is not installed. also the warning there could give a hint.

Aug 12 '24 05:08 bufferoverflow

nvidia toolkit is also installed and its updated too. the warning there just specifies about the orphan containers, still it should create. @qinxuye Its already been more than over a week since this issue was meant to be resolved. Can you please expedite this?

Aug 12 '24 05:08 insistence-essenn

@insistence-essenn It's a local setup problem, as you've seen based on my example it works. Does another docker-compose file work ?

Aug 12 '24 06:08 bufferoverflow

@bufferoverflow Other Docker-compose file works perfectly, even xinference v0.12.3 runs perfectly, and the new version runs fine in a xinference-local, but cant get the supervisor worker set up. @qinxuye @amumu96 Can you please resolve this issue quickly? its already been a long time.

Aug 13 '24 05:08 insistence-essenn

@qinxuye @amumu96 Will it ever get solved? been asking for so long but still no update🥲

Aug 20 '24 04:08 insistence-essenn

@ChengjieLi28 any help?

Aug 20 '24 04:08 insistence-essenn

@bufferoverflow @qinxuye @amumu96 @ChengjieLi28 I'm also facing the same issue as attached. I have tried doing the same in both my Linux Server using Docker and in my Windows machine using Docker. I used the docker-compose-distributed.yml file with the docker image xprobe/xinference:v0.14.3

version: '3.8'

services:
  xinference: &xinference
    image: xprobe/xinference:v0.14.3
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - capabilities: [gpu]
    #           driver: nvidia
    #           count: all
#    volumes:
#      # Replace <xinference_home> with your xinference home path on the host machine
#      - <xinference_home>:/root/.xinference
#      # Replace <huggingface_cache_dir> with your huggingface cache path, default is
#      # <home_path>/.cache/huggingface
#      - <huggingface_cache_dir>:/root/.cache/huggingface
#      # If models are downloaded from modelscope, replace <huggingface_cache_dir> with
#      # your modelscope cache path, default is <home_path>/.cache/modelscope
#      - <modelscope_cache_dir>:/root/.cache/modelscope
#    environment:
#      # add envs here. Here's an example, if you want to download model from modelscope
#      - XINFERENCE_MODEL_SRC=modelscope

  xinference-supervisor:
    <<: *xinference
    ports:
      - "9997:9997"
      - "9999:9999"
    command: xinference-supervisor --host xinference-supervisor --port 9997 --supervisor-port 9999
    restart: always
    healthcheck:
      test: curl --fail http://xinference-supervisor:9997/status || exit 1
      interval: 5s
      retries: 5
      start_period: 5s
      timeout: 5s

  # This examples is just using two workers. You can add more by incrementing
  # the worker suffix and port number.
  xinference-worker-1:
    <<: *xinference
    ports:
      - "30001:30001"
    command: xinference-worker -e http://xinference-supervisor:9997 --host xinference-worker-1 --worker-port 30001
    restart: always
    depends_on:
      xinference-supervisor:
        condition: service_healthy

  xinference-worker-2:
    <<: *xinference
    ports:
      - "30002:30002"
    command: xinference-worker -e http://xinference-supervisor:9997 --host xinference-worker-2 --worker-port 30002
    restart: always
    depends_on:
      xinference-supervisor:
        condition: service_healthy

Aug 27 '24 09:08 saran-raj-18

@bufferoverflow @saran-raj-18 I will update information here after I test this issue.

Aug 28 '24 03:08 ChengjieLi28

@insistence-essenn @saran-raj-18 Could you please try this Dockerfile:

FROM xprobe/xinference:v0.14.3

CMD ["/bin/bash"]

Build a simple docker image. And use this instead of xprobe/xinference:v0.14.3 in docker-compose-distributed.yml file to try agian. If this works, I will fix this for the next release.

Aug 28 '24 05:08 ChengjieLi28

@ChengjieLi28 It's working now with the above changes you mentioned. Thanks :)

Aug 28 '24 06:08 saran-raj-18