stable-diffusion-webui-amdgpu-forge SD1.5 Vpred checkpoints don't work

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

SD1.5 V-pred models do not work properly, only noise is generated, most likely the .yaml configuration file is not considered

Example of generation from logs: 00023-2961903940

Steps to reproduce the problem

Download any Vpred model and the yaml file to it in the models folder
Try to start the generation
Get a noise like there is no yaml file

What should have happened?

Something has to work out, not just the noise

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

https://pastebin.com/Ne3X3mUj

Console logs

venv "C:\SD\stable-diffusion-webui-amdgpu-forge\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: f2.0.1v1.10.1-1.10.1
Commit hash: 50648cda2cc8ef0a63b7d727be86cdb0c6fca0ec
ROCm: agents=['gfx1103']
ROCm: version=6.1, using agent gfx1103
ZLUDA support: experimental
Using ZLUDA in C:\SD\stable-diffusion-webui-amdgpu-forge\.zluda
Launching Web UI with arguments: --ckpt-dir D:/Backup/var/Sd-qdiff/modeldir/models/Stable-diffusion --hypernetwork-dir D:/Backup/var/Sd-qdiff/modeldir/models/hypernetworks --embeddings-dir D:/Backup/var/Sd-qdiff/modeldir/embeddings --lora-dir D:/Backup/var/Sd-qdiff/modeldir/models/Lora --vae-dir D:/Backup/var/Sd-qdiff/modeldir/models/VAE
Total VRAM 14582 MB, total RAM 28477 MB
pytorch version: 2.3.1+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon 780M Graphics [ZLUDA] : native
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
Using pytorch cross attention
Using pytorch attention for VAE
ONNX: version=1.19.2 provider=CPUExecutionProvider, available=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
ControlNet preprocessor location: C:\SD\stable-diffusion-webui-amdgpu-forge\models\ControlNetPreprocessor
Loading additional modules ... done.
2024-10-27 18:50:52,626 - ControlNet - INFO - ControlNet UI callback registered.
[ERROR]: Config states C:\SD\stable-diffusion-webui-amdgpu-forge\config_states\civitai_subfolders.json, "created_at" does not exist
Model selected: {'checkpoint_info': {'filename': 'D:\\Backup\\var\\Sd-qdiff\\modeldir\\models\\Stable-diffusion\\indigoFurryMix_se02Vpred.safetensors', 'hash': '3070c3c6'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 15.2s (prepare environment: 1.3s, import torch: 6.1s, initialize shared: 1.0s, load scripts: 1.2s, initialize google blockly: 2.8s, create ui: 1.6s, gradio launch: 1.1s).
Environment vars changed: {'stream': False, 'inference_memory': 4550.0, 'pin_shared_memory': False}
[GPU Setting] You will use 68.80% GPU memory (10032.00 MB) to load weights, and use 31.20% GPU memory (4550.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 92.98% GPU memory (13558.00 MB) to load weights, and use 7.02% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 4550.0, 'pin_shared_memory': False}
[GPU Setting] You will use 68.80% GPU memory (10032.00 MB) to load weights, and use 31.20% GPU memory (4550.00 MB) to do matrix computation.
Loading Model: {'checkpoint_info': {'filename': 'D:\\Backup\\var\\Sd-qdiff\\modeldir\\models\\Stable-diffusion\\indigoFurryMix_se02Vpred.safetensors', 'hash': '3070c3c6'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'unet': 686, 'vae': 248, 'text_encoder': 196, 'ignore': 0}
C:\SD\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
K-Model Created: {'storage_dtype': torch.float16, 'computation_dtype': torch.float16}
Model loaded in 2.3s (unload existing model: 0.3s, forge model load: 2.0s).
[Unload] Trying to free 4855.14 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 12788.24 MB, Model Require: 234.72 MB, Previously Loaded: 0.00 MB, Inference Require: 4550.00 MB, Remaining: 8003.52 MB, All loaded to GPU.
Moving model(s) has taken 0.46 seconds
[Unload] Trying to free 4550.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 12445.27 MB ... Done.
[Unload] Trying to free 6681.23 MB for cuda:0 with 0 models keep loaded ... Current free memory is 12444.98 MB ... Done.
[Memory Management] Target: KModel, Free GPU: 12444.98 MB, Model Require: 1639.41 MB, Previously Loaded: 0.00 MB, Inference Require: 4550.00 MB, Remaining: 6255.57 MB, All loaded to GPU.
Moving model(s) has taken 1.55 seconds
100%|##################################################################################| 20/20 [00:13<00:00,  1.45it/s]
[Unload] Trying to free 4757.42 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10693.41 MB ... Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 10693.41 MB, Model Require: 159.56 MB, Previously Loaded: 0.00 MB, Inference Require: 4550.00 MB, Remaining: 5983.85 MB, All loaded to GPU.
Moving model(s) has taken 0.06 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:14<00:00,  1.40it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:14<00:00,  1.71it/s]

Additional information

I was correcting the requirements_versions.txt , it was not possible to launch WebUI without it

python-multipart==0.0.10
optimum==1.22.0

You might be able to use it, but it didn't help me https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/1109

I also get the "CUDA out of memory" (https://pastebin.com/CWk8g1w4) error every time I try to use any XL model, although everything works on the NON-Forge version with the --disable-nan-check --medvram parameters. I will be very grateful if you can tell me how to fix this, because changing the "GPU Weights" parameter does not help me and I can only use old SD1.5 models without v-pred

Oct 27 '24 15:10 Vol4ikk

sgemm is already implemented. Please check whether ZLUDA BLAS is loaded and is being used. (cublas.dll on Windows, libcublas.so on Linux)

Aug 24 '24 03:08 lshqqytiger

tasklist /m cublas.dll i write this in cmd if i understood right and it says what can't find it Do I need to replace some dlls in the python libraries?

Aug 24 '24 09:08 Captain-SeaL

What application are you trying?

Aug 24 '24 11:08 lshqqytiger

Fooocus

Aug 24 '24 11:08 Captain-SeaL

Follow the second paragraph of ZLUDA PyTorch instruction.

cublas64_*.dll
cusparse64_*.dll
nvrtc64_*_*.dll

Aug 24 '24 11:08 lshqqytiger

Fooocus

Here you find a install Guide for Fooocus with ZLUDA. I tested it a few minutes ago. https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides

Sep 14 '24 21:09 CS1o

Follow the second paragraph of ZLUDA PyTorch instruction.
cublas64_*.dll
cusparse64_*.dll
nvrtc64_*_*.dll

Error loading caffe2_nvrtc.dll or its dependencied after replace the three dll

Nov 15 '24 10:11 HysterLc

Make sure that you have

AMD GPU driver (amdhip64.dll, or amdhip64_6.dll with HIP SDK 6.1)
HIP SDK (rocblas.dll, rocsolver.dll, rocsparse.dll, and hiprtc0601.dll (0507 in HIP SDK 5.7))
Microsoft Visual C Runtime (vcruntime140.dll)

Nov 15 '24 10:11 lshqqytiger

Error loading caffe2_nvrtc.dll or its dependencied after replace the three dll

That error is caused by any Python Version that got installed through the Microsoft Store. To fix it, uninstall all Python Versions you have under the Windows Settings/APPS. Then install Python 3.10.11 64bit with its normal installer from here: https://www.python.org/downloads/release/python-31011/ Check "Add Python to Path" and reboot the System.

Open up a cmd and type python --version and where python

Verify that the Path to Python 3.10.11 is at the top.

Nov 15 '24 13:11 CS1o

Error loading caffe2_nvrtc.dll or its dependencied after replace the three dll

That error is caused by any Python Version that got installed through the Microsoft Store. To fix it, uninstall all Python Versions you have under the Windows Settings/APPS. Then install Python 3.10.11 64bit with its normal installer from here: https://www.python.org/downloads/release/python-31011/ Check "Add Python to Path" and reboot the System.

Open up a cmd and type python --version and where python

Verify that the Path to Python 3.10.11 is at the top.

It's a little better, but new error emerge Error loading "C:\Users\M1175\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\lib\cusolver64_11.dll" or one of its dependencies.

Nov 18 '24 05:11 HysterLc