stable-fast Cant get it runnig... anyone can help please? RTX 3090

Hi, I am trying to run python3 optimize_stable_diffusion_pipeline.py and I get this nasty error where I cant really tell much what exactly is wrong as it reffers to like everyting included here..

I am pretty sure I have correct version of stable fast, the one corresponding with my python 3.1, cuda 12.1 and torch 2.1, gcc is 11.4.. My system is RTX 3090, i7 etc... quite clean install of ubuntu. Nvidia drivers are 530.

this is my pip3 list:

pip3 list
Package Version

accelerate 0.25.0 antlr4-python3-runtime 4.9.3 apturl 0.5.2 bcrypt 3.2.0 blinker 1.4 Brlapi 0.8.3 certifi 2020.6.20 chardet 4.0.0 click 8.0.3 colorama 0.4.4 command-not-found 0.3 cryptography 3.4.8 cupshelpers 1.0 dbus-python 1.2.18 defer 1.0.6 diffusers 0.24.0 distro 1.7.0 distro-info 1.1+ubuntu0.1 duplicity 0.8.21 fasteners 0.14.1 filelock 3.13.1 fsspec 2023.12.1 future 0.18.2 httplib2 0.20.2 huggingface-hub 0.19.4 idna 3.3 importlib-metadata 4.6.4 jeepney 0.7.1 Jinja2 3.1.2 keyring 23.5.0 language-selector 0.1 launchpadlib 1.10.16 lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lockfile 0.12.2 louis 3.20.0 macaroonbakery 1.3.1 Mako 1.1.3 MarkupSafe 2.0.1 monotonic 1.6 more-itertools 8.10.0 mpmath 1.3.0 netifaces 0.11.0 networkx 3.2.1 numpy 1.26.2 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.3.101 nvidia-nvtx-cu12 12.1.105 oauthlib 3.2.0 olefile 0.46 omegaconf 2.3.0 packaging 23.2 paramiko 2.9.3 pexpect 4.8.0 Pillow 9.0.1 pip 22.0.2 protobuf 3.12.4 psutil 5.9.6 ptyprocess 0.7.0 pycairo 1.20.1 pycups 2.0.1 PyGObject 3.42.1 PyJWT 2.3.0 pymacaroons 0.13.0 PyNaCl 1.5.0 pyparsing 2.4.7 PyQt5 5.15.10 PyQt5-Qt5 5.15.2 PyQt5-sip 12.13.0 pyRFC3339 1.1 python-apt 2.4.0+ubuntu2 python-dateutil 2.8.1 python-debian 0.1.43+ubuntu1.1 pytz 2022.1 pyxdg 0.27 PyYAML 5.4.1 regex 2023.10.3 reportlab 3.6.8 requests 2.25.1 safetensors 0.4.1 screen-resolution-extra 0.0.0 SecretStorage 3.3.1 setuptools 59.6.0 six 1.16.0 ssh-import-id 5.11 stable-fast 0.0.13.post3+torch210cu121 sympy 1.12 systemd-python 234 tokenizers 0.15.0 torch 2.1.0 torchvision 0.16.0 tqdm 4.66.1 transformers 4.35.2 triton 2.1.0 typing_extensions 4.8.0 ubuntu-advantage-tools 8001 ubuntu-drivers-common 0.0.0 ufw 0.36.1 unattended-upgrades 0.1 urllib3 1.26.5 usb-creator 0.3.7 wadllib 1.3.6 wheel 0.37.1 xdg 5 xformers 0.0.22.post7 xkit 0.0.0 zipp 1.0.0

and error is below:

Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00, 6.96it/s] /home/sd/.local/lib/python3.10/site-packages/torch/cuda/graphs.py:88: UserWarning: The CUDA Graph is empty. This usually means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ../aten/src/ATen/cuda/CUDAGraph.cpp:192.) super().capture_end() /home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:159: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! obj_type = tensors[start].item() /home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:218: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = tensors[start].item() /home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:228: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = tensors[start].item() /home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:214: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! return bytes(tensors[start].tolist()), start + 1 /home/sd/.local/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:66: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if input_shape[-1] > 1 or self.sliding_window is not None: /home/sd/.local/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if past_key_values_length > 0: /home/sd/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:273: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len): /home/sd/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:281: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len): /home/sd/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:313: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim): /home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:23: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. return torch.tensor([num], dtype=torch.int64) /home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:253: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. return super().new(cls, x, *args, **kwargs) /home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:123: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. return (torch.as_tensor(tuple(obj), dtype=torch.uint8), ) 0%| | 0/30 [00:00<?, ?it/s]/home/sd/.local/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:197: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! return bool(tensors[start].item()), start + 1 /home/sd/.local/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:878: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if dim % default_overall_up_factor != 0: /home/sd/.local/lib/python3.10/site-packages/diffusers/models/resnet.py:265: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /home/sd/.local/lib/python3.10/site-packages/diffusers/models/resnet.py:271: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /home/sd/.local/lib/python3.10/site-packages/diffusers/models/resnet.py:173: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /home/sd/.local/lib/python3.10/site-packages/diffusers/models/resnet.py:186: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if hidden_states.shape[0] >= 64: /usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda /usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda /usr/bin/ld: cannot find -lcuda: No such file or directory /usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda /usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda collect2: error: ld returned 1 exit status 0%| | 0/30 [00:03<?, ?it/s] Traceback (most recent call last): File "/home/sd/Playground/stable-fast/examples/optimize_stable_diffusion_pipeline.py", line 150, in main() File "/home/sd/Playground/stable-fast/examples/optimize_stable_diffusion_pipeline.py", line 132, in main model(**get_kwarg_inputs()) File "/home/sd/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/sd/.local/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 918, in call noise_pred = self.unet( File "/home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/sd/.local/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 29, in dynamic_graphed_callable cached_callable = simple_make_graphed_callable( File "/home/sd/.local/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 46, in simple_make_graphed_callable return make_graphed_callable(callable, File "/home/sd/.local/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 75, in make_graphed_callable callable(*tree_copy(example_inputs), File "/home/sd/.local/lib/python3.10/site-packages/sfast/jit/trace_helper.py", line 62, in wrapper return traced_module(*args, **kwargs) File "/home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/sd/.local/lib/python3.10/site-packages/sfast/jit/trace_helper.py", line 119, in forward outputs = self.module(*self.convert_inputs(args, kwargs)) File "/home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last):

graph(%input, %num_groups, %weight, %bias, %eps, %cudnn_enabled): %y : Tensor = sfast_triton::group_norm_silu(%input, %num_groups, %weight, %bias, %eps) ~~~~~~~~~~~~ <--- HERE return (%y) RuntimeError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpy9k09g46/main.c', '-O3', '-I/home/sd/.local/lib/python3.10/site-packages/triton/common/../third_party/cuda/include', '-I/usr/include/python3.10', '-I/tmp/tmpy9k09g46', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpy9k09g46/group_norm_4d_channels_last_forward_collect_stats_kernel.cpython-310-x86_64-linux-gnu.so', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu', '-L/lib/i386-linux-gnu']' returned non-zero exit status 1.

At: /usr/lib/python3.10/subprocess.py(369): check_call /home/sd/.local/lib/python3.10/site-packages/triton/common/build.py(90): _build /home/sd/.local/lib/python3.10/site-packages/triton/compiler/make_launcher.py(39): make_stub /home/sd/.local/lib/python3.10/site-packages/triton/compiler/compiler.py(425): compile (63): group_norm_4d_channels_last_forward_collect_stats_kernel /home/sd/.local/lib/python3.10/site-packages/sfast/triton/init.py(35): new_func /home/sd/.local/lib/python3.10/site-packages/triton/runtime/autotuner.py(232): run /home/sd/.local/lib/python3.10/site-packages/triton/runtime/autotuner.py(232): run /home/sd/.local/lib/python3.10/site-packages/sfast/triton/ops/group_norm.py(425): group_norm_forward /home/sd/.local/lib/python3.10/site-packages/sfast/triton/torch_ops.py(188): forward /home/sd/.local/lib/python3.10/site-packages/torch/autograd/function.py(539): apply /home/sd/.local/lib/python3.10/site-packages/sfast/triton/torch_ops.py(226): group_norm_silu /home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl /home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl /home/sd/.local/lib/python3.10/site-packages/sfast/jit/trace_helper.py(119): forward /home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl /home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl /home/sd/.local/lib/python3.10/site-packages/sfast/jit/trace_helper.py(62): wrapper /home/sd/.local/lib/python3.10/site-packages/sfast/cuda/graphs.py(75): make_graphed_callable /home/sd/.local/lib/python3.10/site-packages/sfast/cuda/graphs.py(46): simple_make_graphed_callable /home/sd/.local/lib/python3.10/site-packages/sfast/cuda/graphs.py(29): dynamic_graphed_callable /home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl /home/sd/.local/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl /home/sd/.local/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py(918): call /home/sd/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py(115): decorate_context /home/sd/Playground/stable-fast/examples/optimize_stable_diffusion_pipeline.py(132): main /home/sd/Playground/stable-fast/examples/optimize_stable_diffusion_pipeline.py(150):

can anyone at least guid me what could be wrong? Thanks

Dec 08 '23 12:12 blacklig

@blacklig Do you have cuda toolkit installed? But always remember training is only experimental and won't bring any significant improvements so far. Currently stable-fast's main advantage is in inference.

Dec 08 '23 13:12 chengzeyi

Hi @chengzeyi

I think I do, at least here:

/usr/local ➜ local ls bin cuda cuda-12 cuda-12.1 etc games include lib man sbin share src

but seems like quite more versions there.. hmm

anyways pytorch is recognizing cuda..

And actually how to run interference? via the main.py? Cannot run it as well

stable-fast git:(main) pip3 install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: torch in /home/sd/.local/lib/python3.10/site-packages (from -r requirements.txt (line 3)) (2.1.0) Requirement already satisfied: typing-extensions in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (4.8.0) Requirement already satisfied: filelock in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (3.13.1) Requirement already satisfied: networkx in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (3.2.1) Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.3.1) Requirement already satisfied: jinja2 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (3.1.2) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.105) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.105) Requirement already satisfied: sympy in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (1.12) Requirement already satisfied: triton==2.1.0 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (2.1.0) Requirement already satisfied: fsspec in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (2023.12.1) Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.0.106) Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (11.0.2.54) Requirement already satisfied: nvidia-nccl-cu12==2.18.1 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (2.18.1) Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.105) Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (8.9.2.26) Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (11.4.5.107) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (12.1.105) Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /home/sd/.local/lib/python3.10/site-packages (from torch->-r requirements.txt (line 3)) (10.3.2.106) Requirement already satisfied: nvidia-nvjitlink-cu12 in /home/sd/.local/lib/python3.10/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->-r requirements.txt (line 3)) (12.3.101) Requirement already satisfied: MarkupSafe>=2.0 in /usr/lib/python3/dist-packages (from jinja2->torch->-r requirements.txt (line 3)) (2.0.1) Requirement already satisfied: mpmath>=0.19 in /home/sd/.local/lib/python3.10/site-packages (from sympy->torch->-r requirements.txt (line 3)) (1.3.0) ➜ stable-fast git:(main) python3 main.py
Traceback (most recent call last): File "/home/sd/Playground/stable-fast/main.py", line 107, in main() File "/home/sd/Playground/stable-fast/main.py", line 62, in main model = load_model() File "/home/sd/Playground/stable-fast/main.py", line 13, in load_model model = DiffusionPipeline.from_pretrained(base_model_path, File "/home/sd/.local/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 1090, in from_pretrained cached_folder = cls.download( File "/home/sd/.local/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 1649, in download info = model_info( File "/home/sd/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn validate_repo_id(arg_value) File "/home/sd/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 164, in validate_repo_id raise HFValidationError( huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: '../stable-diffusion-v1-5'. ➜ stable-fast git:(main)

Dec 08 '23 13:12 blacklig

@blacklig see examples or this colab: https://colab.research.google.com/github/camenduru/stable-fast-colab/blob/main/stable_fast_colab.ipynb

Dec 08 '23 15:12 chengzeyi

I get the same error under windows. It runs, but extremely slowly. Inference takes 40s per iteration on my 1050 ti, 1.5s without stable-fast.

stable-fast Nightly Release 20231217 python 3.11, torch 2.1.1, cuda 12.1

Dec 17 '23 17:12 YulienPohl

I get the same error under windows. It runs, but extremely slowly. Inference takes 40s per iteration on my 1050 ti, 1.5s without stable-fast.

stable-fast Nightly Release 20231217 python 3.11, torch 2.1.1, cuda 12.1

I am sorry but your card is too old and the varm is too limited for stable diffusion. Please try with a newer GPU, at least a 20xx model, for example: 2080Ti or T4.

Dec 18 '23 07:12 chengzeyi