cannot work with lambdalabs gpu
I am following this tutorial https://replicate.com/docs/guides/get-a-gpu-machine
I run
sudo cog predict r8.im/stability-ai/stable-diffusion@sha256:ac732df83cea7fff18b8472768c88ad041fa750ff7682a21affe81863cbe77e4 -i prompt="a pot of gold"
And getting the following error :
Starting Docker image r8.im/stability-ai/stable-diffusion@sha256:ac732df83cea7fff18b8472768c88ad041fa750ff7682a21affe81863cbe77e4 and running setup()...
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py", line 354, in <module>
app = create_app(
^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/server/http.py", line 71, in create_app
predictor = load_predictor_from_ref(predictor_ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/cog/predictor.py", line 155, in load_predictor_from_ref
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/src/predict.py", line 17, in <module>
from dynamic_sd.src.pipeline_stable_diffusion_ait_alt import StableDiffusionAITPipeline
File "/src/dynamic_sd/src/pipeline_stable_diffusion_ait_alt.py", line 40, in <module>
from .compile_lib.compile_vae_alt import map_vae
File "/src/dynamic_sd/src/compile_lib/compile_vae_alt.py", line 21, in <module>
from ..modeling.vae import AutoencoderKL as ait_AutoencoderKL
File "/src/dynamic_sd/src/modeling/vae.py", line 22, in <module>
from .unet_blocks import get_up_block, UNetMidBlock2D
File "/src/dynamic_sd/src/modeling/unet_blocks.py", line 36, in <module>
from .clip import SpatialTransformer
File "/src/dynamic_sd/src/modeling/clip.py", line 24, in <module>
USE_CUDA = detect_target().name() == "cuda"
^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.4/lib/python3.11/site-packages/aitemplate/testing/detect_target.py", line 132, in detect_target
raise RuntimeError("Unsupported platform")
RuntimeError: Unsupported platform
ⅹ Failed to get container status: exit status 1
any feedback?
i am getting the same issue
any feedback on this ?
Hi,
I am an SWE working for Lambda, and I decided to look into this problem. I know next to nothing about cog, and following the directions linked in the original report I can confirm that I can reproduce the problem.
I did find that the following steps on a freshly launched instance successfully generated a file output.0.png though:
git clone https://github.com/replicate/cog-stable-diffusion.gitcd cog-stable-diffusion/sudo cog run script/download-weights && clear(output from the script left my terminal in a bad state, hence the clear)sudo cog predict -i prompt="a pot of gold"
Is the version of CUDA provided by Lambda Stack not supported? I ask because the first line of output from that last command is the following:
⚠ Cog doesn't know if CUDA 11.8 is compatible with PyTorch 1.13.0. This might cause CUDA problems.
Note that I don't know where the "CUDA 11.8" is coming from:
Mon May 6 23:53:09 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10 On | 00000000:08:00.0 Off | 0 |
| 0% 36C P8 16W / 150W | 3MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
If there is anything that I can do to help troubleshoot this, or if there's a change to our on-demand VM base image that might prevent this in the future, please let me know.
no news on this from replicate team?
hey I have the same issue here! any news? @Jordan-Lambda