inference icon indicating copy to clipboard operation
inference copied to clipboard

BUG: Could not start v0.11.0 from Docker Compose

Open Minamiyama opened this issue 9 months ago • 16 comments

Minamiyama avatar May 12 '24 06:05 Minamiyama

我也是最新版的镜像起不来

XiaoCC avatar May 12 '24 13:05 XiaoCC

me too.+1

yanmao2023 avatar May 12 '24 16:05 yanmao2023

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?

In my own machine with two GPUs, I can use xinference it normally with above method.

ChengjieLi28 avatar May 13 '24 05:05 ChengjieLi28

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it?

In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Minamiyama avatar May 13 '24 07:05 Minamiyama

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it? In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

ChengjieLi28 avatar May 13 '24 07:05 ChengjieLi28

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it? In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image

image

image

Minamiyama avatar May 13 '24 07:05 Minamiyama

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it? In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image

image

image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

ChengjieLi28 avatar May 13 '24 07:05 ChengjieLi28

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it? In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image image image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image

is this message useful?

Minamiyama avatar May 13 '24 08:05 Minamiyama

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it? In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image image image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image

is this message useful?

image What's this? It seems no relation with xinference and it may be the issue with your cuda environment. The docker image uses pytorch image as the base image. You can try that whether you can use this image directly:

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

ChengjieLi28 avatar May 13 '24 08:05 ChengjieLi28

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it? In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image image image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image is this message useful?

image What's this? It seems no relation with xinference and it may be the issue with your cuda environment. The docker image uses pytorch image as the base image. You can try that whether you can use this image directly: pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

image

cause by adding --log-level, mayby it's my wrong usage

Minamiyama avatar May 13 '24 08:05 Minamiyama

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it? In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image image image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image is this message useful?

image What's this? It seems no relation with xinference and it may be the issue with your cuda environment. The docker image uses pytorch image as the base image. You can try that whether you can use this image directly: pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

image

cause by adding --log-level, mayby it's my wrong usage

image

nothing new shown, and auto shut down as well

Minamiyama avatar May 13 '24 08:05 Minamiyama

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

Just run

docker run --gpus all pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel tail -f /dev/null

still auto shutdown?

ChengjieLi28 avatar May 13 '24 08:05 ChengjieLi28

@Minamiyama @yanmao2023 @XiaoCC Could you please help me test this: Build a new image based on our offical image:

FROM xprobe/xinference:v0.11.0

RUN pip install torchvision==0.17.1

And then test it? In my own machine with two GPUs, I can use xinference it normally with above method.

It seems not work for me

Paste your error stack. And your related commands.

image image image

What's the error here? You can also add --log-level debug in your entrypoint command. Could you just test:

  1. build the new image
  2. run it
docker run -p 9997:9997 --gpus all <the new image> xinference-local --log-level debug -H 0.0.0.0
  1. And then use it

image is this message useful?

image What's this? It seems no relation with xinference and it may be the issue with your cuda environment. The docker image uses pytorch image as the base image. You can try that whether you can use this image directly: pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

image cause by adding --log-level, mayby it's my wrong usage

image

nothing new shown, and auto shut down as well

The host machine is windows OS. May cannot use 0.0.0.0. I haven't tried windows. Remove -H 0.0.0.0 and try again.

ChengjieLi28 avatar May 13 '24 08:05 ChengjieLi28

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

Just run

docker run --gpus all pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel tail -f /dev/null

still auto shutdown?

image

running normally

Minamiyama avatar May 13 '24 08:05 Minamiyama

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

Just run

docker run --gpus all pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel tail -f /dev/null

still auto shutdown?

image

running normally

Cannot reproduce.

docker pull xprobe/xinference:nightly-bug_torchvision_version

This image is built by #1485 . And I can use it normally on my ubuntu machine.

ChengjieLi28 avatar May 13 '24 09:05 ChengjieLi28

pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

Just run

docker run --gpus all pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel tail -f /dev/null

still auto shutdown?

image running normally

Cannot reproduce.

docker pull xprobe/xinference:nightly-bug_torchvision_version

This image is built by #1485 . And I can use it normally on my ubuntu machine.

image

failed as well

Minamiyama avatar May 13 '24 20:05 Minamiyama

@Minamiyama Try this image:

docker pull xprobe/xinference:nightly-docker_crash_due_to_llama

ChengjieLi28 avatar May 16 '24 08:05 ChengjieLi28

0.11.1 is ok

Minamiyama avatar May 20 '24 13:05 Minamiyama