Cant get it runnig... anyone can help please? RTX 3090
Hi, I am trying to run python3 optimize_stable_diffusion_pipeline.py and I get this nasty error where I cant really tell much what exactly is wrong as it reffers to like everyting included here..
I am pretty sure I have correct version of stable fast, the one corresponding with my python 3.1, cuda 12.1 and torch 2.1, gcc is 11.4.. My system is RTX 3090, i7 etc... quite clean install of ubuntu. Nvidia drivers are 530.
this is my pip3 list:
pip3 list
Package Version
accelerate 0.25.0 antlr4-python3-runtime 4.9.3 apturl 0.5.2 bcrypt 3.2.0 blinker 1.4 Brlapi 0.8.3 certifi 2020.6.20 chardet 4.0.0 click 8.0.3 colorama 0.4.4 command-not-found 0.3 cryptography 3.4.8 cupshelpers 1.0 dbus-python 1.2.18 defer 1.0.6 diffusers 0.24.0 distro 1.7.0 distro-info 1.1+ubuntu0.1 duplicity 0.8.21 fasteners 0.14.1 filelock 3.13.1 fsspec 2023.12.1 future 0.18.2 httplib2 0.20.2 huggingface-hub 0.19.4 idna 3.3 importlib-metadata 4.6.4 jeepney 0.7.1 Jinja2 3.1.2 keyring 23.5.0 language-selector 0.1 launchpadlib 1.10.16 lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lockfile 0.12.2 louis 3.20.0 macaroonbakery 1.3.1 Mako 1.1.3 MarkupSafe 2.0.1 monotonic 1.6 more-itertools 8.10.0 mpmath 1.3.0 netifaces 0.11.0 networkx 3.2.1 numpy 1.26.2 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.3.101 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.0 olefile 0.46 omegaconf 2.3.0 packaging 23.2 paramiko 2.9.3 pexpect 4.8.0 Pillow 9.0.1 pip 22.0.2 protobuf 3.12.4 psutil 5.9.6 ptyprocess 0.7.0 pycairo 1.20.1 pycups 2.0.1 PyGObject 3.42.1 PyJWT 2.3.0 pymacaroons 0.13.0 PyNaCl 1.5.0 pyparsing 2.4.7 PyQt5 5.15.10 PyQt5-Qt5 5.15.2 PyQt5-sip 12.13.0 pyRFC3339 1.1 python-apt 2.4.0+ubuntu2 python-dateutil 2.8.1 python-debian 0.1.43+ubuntu1.1 pytz 2022.1 pyxdg 0.27 PyYAML 5.4.1 regex 2023.10.3 reportlab 3.6.8 requests 2.25.1 safetensors 0.4.1 screen-resolution-extra 0.0.0 SecretStorage 3.3.1 setuptools 59.6.0 six 1.16.0 ssh-import-id 5.11 stable-fast 0.0.13.post3+torch210cu121 sympy 1.12 systemd-python 234 tokenizers 0.15.0 torch 2.1.0 torchvision 0.16.0 tqdm 4.66.1 transformers 4.35.2 triton 2.1.0 typing_extensions 4.8.0 ubuntu-advantage-tools 8001 ubuntu-drivers-common 0.0.0 ufw 0.36.1 unattended-upgrades 0.1 urllib3 1.26.5 usb-creator 0.3.7 wadllib 1.3.6 wheel 0.37.1 xdg 5 xformers 0.0.22.post7 xkit 0.0.0 zipp 1.0.0
and error is below:
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 6.96it/s]
/home/sd/.local/lib/python3.10/site-packages/torch/cuda/graphs.py:88: UserWarning: The CUDA Graph is empty. This usually means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ../aten/src/ATen/cuda/CUDAGraph.cpp:192.)
super().capture_end()
/home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:159: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
obj_type = tensors[start].item()
/home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:218: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size = tensors[start].item()
/home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:228: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
size = tensors[start].item()
/home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:214: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return bytes(tensors[start].tolist()), start + 1
/home/sd/.local/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:66: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1 or self.sliding_window is not None:
/home/sd/.local/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
/home/sd/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:273: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
/home/sd/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:281: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
/home/sd/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:313: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
/home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:23: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
return torch.tensor([num], dtype=torch.int64)
/home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:253: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
return super().new(cls, x, *args, **kwargs)
/home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:123: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
return (torch.as_tensor(tuple(obj), dtype=torch.uint8), )
0%| | 0/30 [00:00<?, ?it/s]/home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:197: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return bool(tensors[start].item()), start + 1
/home/sd/.local/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:878: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if dim % default_overall_up_factor != 0:
/home/sd/.local/lib/python3.10/site-packages/diffusers/models/resnet.py:265: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
/home/sd/.local/lib/python3.10/site-packages/diffusers/models/resnet.py:271: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
/home/sd/.local/lib/python3.10/site-packages/diffusers/models/resnet.py:173: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert hidden_states.shape[1] == self.channels
/home/sd/.local/lib/python3.10/site-packages/diffusers/models/resnet.py:186: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if hidden_states.shape[0] >= 64:
/usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda
/usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda
/usr/bin/ld: cannot find -lcuda: No such file or directory
/usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda
/usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda
collect2: error: ld returned 1 exit status
0%| | 0/30 [00:03<?, ?it/s]
Traceback (most recent call last):
File "/home/sd/Playground/stable-fast/examples/optimize_stable_diffusion_pipeline.py", line 150, in
graph(%input, %num_groups, %weight, %bias, %eps, %cudnn_enabled): %y : Tensor = sfast_triton::group_norm_silu(%input, %num_groups, %weight, %bias, %eps) ~~~~~~~~~~~~ <--- HERE return (%y) RuntimeError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpy9k09g46/main.c', '-O3', '-I/home/sd/.local/lib/python3.10/site-packages/triton/common/../third_party/cuda/include', '-I/usr/include/python3.10', '-I/tmp/tmpy9k09g46', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpy9k09g46/group_norm_4d_channels_last_forward_collect_stats_kernel.cpython-310-x86_64-linux-gnu.so', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu', '-L/lib/i386-linux-gnu']' returned non-zero exit status 1.
At:
/usr/lib/python3.10/subprocess.py(369): check_call
/home/sd/.local/lib/python3.10/site-packages/triton/common/build.py(90): _build
/home/sd/.local/lib/python3.10/site-packages/triton/compiler/make_launcher.py(39): make_stub
/home/sd/.local/lib/python3.10/site-packages/triton/compiler/compiler.py(425): compile
can anyone at least guid me what could be wrong? Thanks
@blacklig Do you have cuda toolkit installed? But always remember training is only experimental and won't bring any significant improvements so far. Currently stable-fast's main advantage is in inference.
Hi @chengzeyi
I think I do, at least here:
/usr/local ➜ local ls bin cuda cuda-12 cuda-12.1 etc games include lib man sbin share src
but seems like quite more versions there.. hmm
anyways pytorch is recognizing cuda..
And actually how to run interference? via the main.py? Cannot run it as well
stable-fast git:(main) pip3 install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: torch in /home/sd/.local/lib/python3.10/site-packages (from -r requirements.txt (line 3)) (2.1.0)
Requirement already satisfied: typing-extensions in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (4.8.0)
Requirement already satisfied: filelock in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (3.13.1)
Requirement already satisfied: networkx in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (3.2.1)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.3.1)
Requirement already satisfied: jinja2 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (3.1.2)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.105)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.105)
Requirement already satisfied: sympy in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (1.12)
Requirement already satisfied: triton==2.1.0 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (2.1.0)
Requirement already satisfied: fsspec in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (2023.12.1)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.0.106)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (11.0.2.54)
Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (2.18.1)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.105)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (8.9.2.26)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (11.4.5.107)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.105)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (10.3.2.106)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /home/sd/.local/lib/python3.10/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->-r requirements.txt (line 3)) (12.3.101)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/lib/python3/dist-packages (from jinja2->torch->-r requirements.txt (line 3)) (2.0.1)
Requirement already satisfied: mpmath>=0.19 in /home/sd/.local/lib/python3.10/site-packages (from sympy->torch->-r requirements.txt (line 3)) (1.3.0)
➜ stable-fast git:(main) python3 main.py
Traceback (most recent call last):
File "/home/sd/Playground/stable-fast/main.py", line 107, in
@blacklig see examples or this colab: https://colab.research.google.com/github/camenduru/stable-fast-colab/blob/main/stable_fast_colab.ipynb
I get the same error under windows. It runs, but extremely slowly. Inference takes 40s per iteration on my 1050 ti, 1.5s without stable-fast.
stable-fast Nightly Release 20231217 python 3.11, torch 2.1.1, cuda 12.1
I get the same error under windows. It runs, but extremely slowly. Inference takes 40s per iteration on my 1050 ti, 1.5s without stable-fast.
stable-fast Nightly Release 20231217 python 3.11, torch 2.1.1, cuda 12.1
I am sorry but your card is too old and the varm is too limited for stable diffusion. Please try with a newer GPU, at least a 20xx model, for example: 2080Ti or T4.