InvokeAI icon indicating copy to clipboard operation
InvokeAI copied to clipboard

[bug]: invokeai docker v5.3.1 main-rocm still using cpu instead of rocm

Open nmcbride opened this issue 1 year ago • 7 comments

Is there an existing issue for this problem?

  • [X] I have searched the existing issues

Operating system

Linux

GPU vendor

AMD (ROCm)

GPU model

RX 7700S, RX 7900 XTX

GPU VRAM

8, 24

Version number

5.3.1

Browser

Firefox

Python dependencies

No response

What happened

I was excited to see that https://github.com/invoke-ai/InvokeAI/issues/7146 was closed and merged in and all would be well in the rocm would however after updating my docker image and running with the new v5.3.1 the same issues persist and the container is using the cpu instead of rocm.

What you expected to happen

I expect rocm to be used.

How to reproduce the problem

No response

Additional context

No response

Discord username

No response

nmcbride avatar Nov 04 '24 05:11 nmcbride

I can confirm this, the pytorch included is still CUDA, not ROCm.

lonyelon avatar Nov 12 '24 08:11 lonyelon

When will this be fixed?

Modulus avatar Nov 27 '24 20:11 Modulus

having the same issue on invokeai 5.5

laktosterror avatar Dec 24 '24 10:12 laktosterror

+1

ElGatoNinja avatar Dec 28 '24 22:12 ElGatoNinja

A workaround is to build your own container using the run.sh script in docker directory copy .env.sample on .env and set GPU_DRIVER=rocm But after I still have issues The last 5-rocm tagged image still have the issue

docker run --rm -it --entrypoint "/bin/bash" ghcr.io/invoke-ai/invokeai:5-rocm
root@98565b597416:/opt/invokeai# uv pip list | grep torch
Using Python 3.11.10 environment at: /opt/venv
clip-anytorch            2.6.0
pytorch-lightning        2.1.3
torch                    2.4.1+cu124
torchmetrics             1.0.3
torchsde                 0.2.6
torchvision              0.19.1+cu124
root@98565b597416:/opt/invokeai# 

once done you can check

root@300bb74e52c8:/opt/invokeai# uv pip list | grep torch
Using Python 3.11.10 environment at: /opt/venv
clip-anytorch           2.6.0
pytorch-lightning       2.1.3
pytorch-triton-rocm     3.0.0
torch                   2.4.1+rocm6.1
torchmetrics            1.0.3
torchsde                0.2.6
torchvision             0.19.1+rocm6.1

cbayle avatar Jan 05 '25 22:01 cbayle

A workaround is to build your own container using the run.sh script in docker directory

copy .env.sample on .env and set GPU_DRIVER=rocm

But after I still have issues

The last 5-rocm tagged image still have the issue


docker run --rm -it --entrypoint "/bin/bash" ghcr.io/invoke-ai/invokeai:5-rocm

root@98565b597416:/opt/invokeai# uv pip list | grep torch

Using Python 3.11.10 environment at: /opt/venv

clip-anytorch            2.6.0

pytorch-lightning        2.1.3

torch                    2.4.1+cu124

torchmetrics             1.0.3

torchsde                 0.2.6

torchvision              0.19.1+cu124

root@98565b597416:/opt/invokeai# 

once done you can check


root@300bb74e52c8:/opt/invokeai# uv pip list | grep torch

Using Python 3.11.10 environment at: /opt/venv

clip-anytorch           2.6.0

pytorch-lightning       2.1.3

pytorch-triton-rocm     3.0.0

torch                   2.4.1+rocm6.1

torchmetrics            1.0.3

torchsde                0.2.6

torchvision             0.19.1+rocm6.1



Did you find a working image version? I tried a few versions back built by invoke (hosted on ghcr), but they all had this problem.

laktosterror avatar Jan 12 '25 10:01 laktosterror

same problem, using yanwk/comfyui-boot:rocm and ollama:rocm works so I don't think the problem is on my side

linux-universe avatar Apr 09 '25 20:04 linux-universe

We've recently updated ROCm images; closing this - please open another issue if still experiencing problems.

ebr avatar Aug 13 '25 15:08 ebr