ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Stable Cascade produces blank results

Open TurningTide opened this issue 11 months ago • 18 comments

Right from example:

image

No errors in console, just single warning clip missing: ['clip_g.logit_scale'].

ComfyUI latest version. Windows 11, Torch 2.1.2, CUDA 12.1, RTX 4x GPU.

TurningTide avatar Feb 26 '24 04:02 TurningTide

I don't see any obvious problem in your node settings at first glance. It should work. Your stage_c result is not normal.

One thing that come to my mind, which Python version are you running? I ask because I had problems using python3.12 with pytorch some time ago:

Currently, PyTorch on Windows only supports Python 3.8-3.11; Python 2.x is not supported.

Please copy paste all your comfyui terminal from a fresh start, to see all the outputs.

Guillaume-Fgt avatar Feb 26 '24 10:02 Guillaume-Fgt

Try downloading a fresh standalone package and try it with that.

comfyanonymous avatar Feb 26 '24 11:02 comfyanonymous

One thing that come to my mind, which Python version are you running?

3.11.8. SD1.5 and SDXL work fine.

Please copy paste all your comfyui terminal from a fresh start, to see all the outputs.

E:\_AI\ComfyUI>python.exe -s main.py
Total VRAM 24563 MB, total RAM 49084 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using pytorch cross attention
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type STABLE_CASCADE
adm 0
Missing VAE keys ['encoder.mean', 'encoder.std']
clip missing: ['clip_g.logit_scale']
left over keys: dict_keys(['clip_l_vision.vision_model.embeddings.class_embedding', 'clip_l_vision.vision_model.embeddings.patch_embedding.weight', 'clip_l_vision.vision_model.embeddings.position_embedding.weight', 'clip_l_vision.vision_model.embeddings.position_ids', 'clip_l_vision.vision_model.encoder.layers.0.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.0.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.0.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.0.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.0.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.0.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.0.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.0.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.1.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.1.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.1.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.1.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.1.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.1.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.1.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.1.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.10.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.10.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.10.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.10.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.10.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.10.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.10.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.10.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.11.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.11.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.11.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.11.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.11.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.11.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.11.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.11.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.12.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.12.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.12.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.12.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.12.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.12.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.12.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.12.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.13.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.13.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.13.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.13.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.13.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.13.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.13.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.13.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.14.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.14.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.14.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.14.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.14.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.14.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.14.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.14.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.15.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.15.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.15.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.15.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.15.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.15.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.15.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.15.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.16.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.16.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.16.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.16.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.16.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.16.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.16.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.16.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.17.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.17.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.17.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.17.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.17.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.17.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.17.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.17.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.18.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.18.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.18.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.18.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.18.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.18.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.18.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.18.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.19.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.19.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.19.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.19.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.19.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.19.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.19.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.19.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.2.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.2.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.2.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.2.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.2.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.2.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.2.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.2.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.20.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.20.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.20.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.20.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.20.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.20.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.20.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.20.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.21.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.21.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.21.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.21.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.21.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.21.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.21.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.21.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.22.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.22.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.22.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.22.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.22.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.22.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.22.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.22.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.23.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.23.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.23.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.23.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.23.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.23.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.23.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.23.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.3.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.3.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.3.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.3.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.3.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.3.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.3.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.3.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.4.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.4.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.4.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.4.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.4.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.4.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.4.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.4.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.5.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.5.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.5.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.5.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.5.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.5.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.5.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.5.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.6.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.6.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.6.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.6.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.6.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.6.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.6.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.6.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.7.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.7.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.7.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.7.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.7.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.7.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.7.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.7.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.8.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.8.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.8.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.8.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.8.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.8.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.8.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.8.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.9.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.9.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.9.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.9.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.9.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.9.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.9.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.9.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.v_proj.weight', 'clip_l_vision.vision_model.post_layernorm.bias', 'clip_l_vision.vision_model.post_layernorm.weight', 'clip_l_vision.vision_model.pre_layrnorm.bias', 'clip_l_vision.vision_model.pre_layrnorm.weight', 'clip_l_vision.visual_projection.weight'])
Requested to load StableCascadeClipModel
Loading 1 new model
Requested to load StableCascade_C
Loading 1 new model
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  7.52it/s]
Requested to load StageC_coder
Loading 1 new model
model_type STABLE_CASCADE
adm 0
clip missing: ['clip_g.logit_scale']
Requested to load StableCascade_B
Loading 1 new model
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  3.76it/s]
Requested to load StageA
Loading 1 new model
Prompt executed in 29.48 seconds

Try downloading a fresh standalone package and try it with that.

Same results. Here's console output:

E:\_AI\ComfyUI_standalone>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
Total VRAM 24563 MB, total RAM 49084 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using pytorch cross attention
****** User settings have been changed to be stored on the server instead of browser storage. ******
****** For multi-user setups add the --multi-user CLI argument to enable multiple user profiles. ******
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type STABLE_CASCADE
adm 0
Missing VAE keys ['encoder.mean', 'encoder.std']
clip missing: ['clip_g.logit_scale']
left over keys: dict_keys(['clip_l_vision.vision_model.embeddings.class_embedding', 'clip_l_vision.vision_model.embeddings.patch_embedding.weight', 'clip_l_vision.vision_model.embeddings.position_embedding.weight', 'clip_l_vision.vision_model.embeddings.position_ids', 'clip_l_vision.vision_model.encoder.layers.0.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.0.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.0.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.0.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.0.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.0.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.0.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.0.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.0.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.1.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.1.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.1.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.1.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.1.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.1.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.1.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.1.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.1.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.10.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.10.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.10.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.10.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.10.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.10.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.10.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.10.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.10.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.11.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.11.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.11.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.11.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.11.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.11.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.11.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.11.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.11.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.12.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.12.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.12.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.12.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.12.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.12.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.12.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.12.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.12.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.13.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.13.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.13.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.13.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.13.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.13.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.13.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.13.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.13.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.14.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.14.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.14.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.14.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.14.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.14.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.14.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.14.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.14.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.15.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.15.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.15.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.15.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.15.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.15.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.15.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.15.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.15.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.16.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.16.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.16.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.16.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.16.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.16.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.16.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.16.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.16.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.17.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.17.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.17.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.17.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.17.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.17.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.17.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.17.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.17.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.18.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.18.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.18.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.18.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.18.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.18.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.18.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.18.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.18.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.19.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.19.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.19.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.19.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.19.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.19.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.19.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.19.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.19.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.2.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.2.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.2.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.2.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.2.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.2.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.2.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.2.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.2.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.20.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.20.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.20.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.20.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.20.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.20.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.20.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.20.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.20.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.21.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.21.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.21.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.21.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.21.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.21.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.21.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.21.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.21.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.22.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.22.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.22.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.22.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.22.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.22.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.22.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.22.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.22.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.23.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.23.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.23.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.23.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.23.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.23.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.23.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.23.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.23.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.3.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.3.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.3.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.3.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.3.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.3.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.3.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.3.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.3.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.4.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.4.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.4.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.4.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.4.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.4.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.4.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.4.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.4.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.5.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.5.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.5.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.5.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.5.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.5.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.5.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.5.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.5.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.6.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.6.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.6.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.6.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.6.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.6.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.6.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.6.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.6.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.7.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.7.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.7.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.7.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.7.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.7.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.7.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.7.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.7.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.8.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.8.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.8.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.8.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.8.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.8.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.8.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.8.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.8.self_attn.v_proj.weight', 'clip_l_vision.vision_model.encoder.layers.9.layer_norm1.bias', 'clip_l_vision.vision_model.encoder.layers.9.layer_norm1.weight', 'clip_l_vision.vision_model.encoder.layers.9.layer_norm2.bias', 'clip_l_vision.vision_model.encoder.layers.9.layer_norm2.weight', 'clip_l_vision.vision_model.encoder.layers.9.mlp.fc1.bias', 'clip_l_vision.vision_model.encoder.layers.9.mlp.fc1.weight', 'clip_l_vision.vision_model.encoder.layers.9.mlp.fc2.bias', 'clip_l_vision.vision_model.encoder.layers.9.mlp.fc2.weight', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.k_proj.bias', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.k_proj.weight', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.out_proj.bias', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.out_proj.weight', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.q_proj.bias', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.q_proj.weight', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.v_proj.bias', 'clip_l_vision.vision_model.encoder.layers.9.self_attn.v_proj.weight', 'clip_l_vision.vision_model.post_layernorm.bias', 'clip_l_vision.vision_model.post_layernorm.weight', 'clip_l_vision.vision_model.pre_layrnorm.bias', 'clip_l_vision.vision_model.pre_layrnorm.weight', 'clip_l_vision.visual_projection.weight'])
Requested to load StableCascadeClipModel
Loading 1 new model
E:\_AI\ComfyUI_standalone\ComfyUI\comfy\ldm\modules\attention.py:344: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
Requested to load StableCascade_C
Loading 1 new model
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  8.23it/s]
Requested to load StageC_coder
Loading 1 new model
model_type STABLE_CASCADE
adm 0
clip missing: ['clip_g.logit_scale']
Requested to load StableCascade_B
Loading 1 new model
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  4.35it/s]
Requested to load StageA
Loading 1 new model
Prompt executed in 28.90 seconds

image

by the way, Stability-AI/StableCascade works pretty well but slow

TurningTide avatar Feb 26 '24 12:02 TurningTide

Does anyone else have this issue? If not it might be a hardware/driver issue.

comfyanonymous avatar Feb 26 '24 12:02 comfyanonymous

Does anyone else have this issue? If not it might be a hardware/driver issue.

Interestingly, yesterday I installed a fresh copy of Windows, while previously I had ComfyUI installed with experimental Cascade nodes - everything was working fine. After reinstalling the OS and ComfyUI, everything lost :. Only thing changed in hw - new ssd sata drive. Nvidia Driver version - 551.61 Game Ready.

any other sd model works without problems image

TurningTide avatar Feb 26 '24 13:02 TurningTide

Did you try lite models?

ashllay avatar Feb 26 '24 18:02 ashllay

Did you try lite models?

Yep, every cascade model from the official repository that is outside the comfyui_checkpoints directory throws an error. Like

got prompt
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
  File "E:\_AI\ComfyUI\execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\_AI\ComfyUI\execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\_AI\ComfyUI\execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\_AI\ComfyUI\nodes.py", line 540, in load_checkpoint
    out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\_AI\ComfyUI\comfy\sd.py", line 506, in load_checkpoint_guess_config
    model_config = model_detection.model_config_from_unet(sd, "model.diffusion_model.")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\_AI\ComfyUI\comfy\model_detection.py", line 191, in model_config_from_unet
    unet_config = detect_unet_config(state_dict, unet_key_prefix)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\_AI\ComfyUI\comfy\model_detection.py", line 77, in detect_unet_config
    model_channels = state_dict['{}input_blocks.0.0.weight'.format(key_prefix)].shape[0]
                     ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'model.diffusion_model.input_blocks.0.0.weight'

TurningTide avatar Feb 27 '24 02:02 TurningTide

Webui Forge + https://github.com/benjamin-bertram/sdweb-easy-stablecascade-diffusers = works

image.

So, Stability-AI/StableCascade, lllyasviel/stable-diffusion-webui-forge and ComfyUI - all were launched without a venv using a global instance of Python (3.11.8, Torch 2.1.2, CUDA 12.1). However, only ComfyUI refuses to yield the expected results.

TurningTide avatar Feb 27 '24 04:02 TurningTide

I managed to run Cascade by loading bf16 models through separate UNET, CLIP, and VAE nodes.

image

This way there are no any warnings in console

got prompt
model_type STABLE_CASCADE
adm 0
Requested to load StableCascadeClipModel
Loading 1 new model
model_type STABLE_CASCADE
adm 0
Requested to load StableCascade_C
Loading 1 new model
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  9.66it/s]
Requested to load StableCascade_B
Loading 1 new model
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  4.48it/s]
Requested to load StageA
Loading 1 new model
Prompt executed in 17.33 seconds

It's possible that there is something wrong with the special comfyui models?

TurningTide avatar Mar 01 '24 03:03 TurningTide

Right from example:

image

No errors in console, just single warning clip missing: ['clip_g.logit_scale'].

ComfyUI latest version. Windows 11, Torch 2.1.2, CUDA 12.1, RTX 4x GPU.

delete your negative prompt and give it a shot again

frankchieng avatar Mar 04 '24 06:03 frankchieng

delete your negative prompt and give it a shot again

nope. as you can see in my previous comment, there is a negative prompt and still great result

TurningTide avatar Mar 04 '24 07:03 TurningTide

I also run into model.diffusion_model.input_blocks.0.0.weight error on revision b7b55931 with lite bf16 models. Happens at loading model b Python 3.11.8 (from bat console output) Nvidia gtx 960/2, driver 536.23 Any other info needed?

Arctomachine avatar Mar 15 '24 10:03 Arctomachine

same error here!

JPGranizo avatar Mar 20 '24 01:03 JPGranizo

Same error here.

ttulttul avatar Apr 08 '24 19:04 ttulttul

Same error here

huanmengmie avatar Apr 09 '24 10:04 huanmengmie

这里同样的错误

winjvlee avatar Apr 12 '24 07:04 winjvlee

Same error here!!! Error occurred when executing unCLIPCheckpointLoader: model.diffusion_model.input_blocks.0.0.weight

use image prompt by cascade: https://github.com/ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO/blob/main/Stable%20Cascade%20ImagePrompt%20Standard%E3%80%90Zho%E3%80%91.json

h3clikejava avatar Apr 20 '24 14:04 h3clikejava