inference 服务Docker重启之后，如何恢复之前运行的模型

Docker
Xinference 0.10.3

服务Docker重启之后，如何恢复之前运行的模型。我这边的每次重启docker之后，还需要手动启动模型才会运行模型。有没有设置可以在docker恢复启动之后，自动运行之前运行的模型？

May 07 '24 01:05 yeyupiaoling

你需要在那个Docker镜像的基础上，手动修改entrypoint脚本，加上xinference命令，让其加载你要的模型。

May 07 '24 05:05 mikeshi80

启动docker的时候，设置xinference的环境变量 XINFERENCE_HOME，然后将环境变量的目录再做个映射。

docker run -v /home/models/:/home/models/ -v /nfs/models/xinference/xinference_cache/:/root/xinference_cache -e XINFERENCE_MODEL_SRC=modelscope -e XINFERENCE_HOME=/root/xinference_cache -d -p 9998:9997 --gpus all --name ffff --shm-size=256g xprobe/xinference:v0.10.1 xinference-local -H 0.0.0.0 --log-level debug

May 17 '24 08:05 WangxuP

@WangxuP 这个方法能重现吗？

我使用pip本地安装 XINFERENCE_HOME没指定，使用默认的<HOME>/.xinference 反复试了，重启之后，部署的模型依然没恢复

May 24 '24 01:05 wencan

xinference

可以的，你把容器内部目录挂载到外部就行了。

May 24 '24 06:05 WangxuP

@WangxuP 朋友，官方已经表示在接下来的版本实现这个功能

May 24 '24 07:05 wencan

@WangxuP 朋友，官方已经表示在接下来的版本实现这个功能

好的，看到了，谢谢。

May 24 '24 07:05 WangxuP

新建一个init.sh文件，内容如下。两条xinference launch 命令里的模型，参数等信息根据你实际的需求改。

#!/bin/bash
xinference-local -H 0.0.0.0 &
PID1=$!
while true; do
  if curl -s "http://localhost:9997" > /dev/null; then
    break
  else
    sleep 1
  fi
done
xinference launch --model-name jina-embeddings-v2-base-zh --model-type embedding --n-gpu None &
PID2=$!
xinference launch --model-name bge-reranker-large --model-type rerank --n-gpu None &
PID3=$!
wait $PID1 $PID2 $PID3

需要重新build一下image，加一下tini，Dockerfile文件参考下面，flash-attn根据自己需求看要不要加，不要的话去掉。

FROM xprobe/xinference:latest
RUN pip install flash-attn --no-build-isolation -i https://pypi.tuna.tsinghua.edu.cn/simple/ && \
  apt-get install -y tini && \
  rm -rf /var/lib/apt/lists/*
COPY init.sh /init.sh
ENTRYPOINT ["/usr/bin/tini", "--", "/init.sh"]

docker-compose.yml文件参考这个

services:
  xinference:
    build:
      context: .
      dockerfile: Dockerfile
    image: xinference-flash-attn:latest
    container_name: xinference
    ports:
      - 9997:9997
    volumes:
      - ./data:/root/.xinference
      - /root/.cache/huggingface:/root/.cache/huggingface
    environment:
      - HF_ENDPOINT=https://hf-mirror.com
    restart: always
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

上面3个文件准备完了都放同一目录下，docker compose build后再docker compose up -d

Jun 04 '24 06:06 kimi360

新建一个init.sh文件，内容如下。两条xinference launch 命令里的模型，参数等信息根据你实际的需求改。

#!/bin/bash
xinference-local -H 0.0.0.0 &
PID1=$!
while true; do
  if curl -s "http://localhost:9997" > /dev/null; then
    break
  else
    sleep 1
  fi
done
xinference launch --model-name jina-embeddings-v2-base-zh --model-type embedding --n-gpu None &
PID2=$!
xinference launch --model-name bge-reranker-large --model-type rerank --n-gpu None &
PID3=$!
wait $PID1 $PID2 $PID3

需要重新build一下image，加一下tini，Dockerfile文件参考下面，flash-attn根据自己需求看要不要加，不要的话去掉。

FROM xprobe/xinference:latest
RUN pip install flash-attn --no-build-isolation -i https://pypi.tuna.tsinghua.edu.cn/simple/ && \
  apt-get install -y tini && \
  rm -rf /var/lib/apt/lists/*
COPY init.sh /init.sh
ENTRYPOINT ["/usr/bin/tini", "--", "/init.sh"]

docker-compose.yml文件参考这个

services:
  xinference:
    build:
      context: .
      dockerfile: Dockerfile
    image: xinference-flash-attn:latest
    container_name: xinference
    ports:
      - 9997:9997
    volumes:
      - ./data:/root/.xinference
      - /root/.cache/huggingface:/root/.cache/huggingface
    environment:
      - HF_ENDPOINT=https://hf-mirror.com
    restart: always
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

上面3个文件准备完了都放同一目录下，docker compose build后再docker compose up -d

感谢，原来 xinference-local -H 0.0.0.0是会一直执行，怪不得后续的脚本无法执行，而用&和wait则能解决此问题，省事的话没必要重新build，直接在映射的/root/.xinference创建一个init.sh就行，然后在docker-compose.yml中增加配置：command: sh /root/.xinference/init.sh即可，而且脚本中可以不用写PID1=$!类似这些，直接最后一行是wait就行，会等待所有进程结束。 ~~但是这种也仅仅是up时会触发，如果手动关闭后再start并不会触发，但是这是docker本身存在的问题，目前貌似也没有好的解决方法。~~ 也会触发，大模型乱说的我信了。

Aug 06 '24 03:08 nadirvishun

如果生产环境，一般不会随意更换模型，则可以使用健康检查机制，在里边检查模型是否运行就行

Aug 06 '24 07:08 jony4

有了楼上几位的思路，其实自动启动就很容易了 model目录必然是映射出来的，那直接在model目录里建个init.sh，内容： xinference-local -H 0.0.0.0 & PID1=$! while true; do if curl -s "http://localhost:9997" > /dev/null; then break else sleep 1 fi done xinference launch --model-name qwen-chat --model-format pytorch --model-engine Transformers -s 7 & PID2=$! wait $PID1 $PID2

然后在docker-compose.yml里配置command为 command: sh /data/models/init.sh 就ok了，要自动启动啥模型，直接在init.sh里写就好了

Aug 15 '24 03:08 fsea