stable-diffusion-webui-forge FLUX doesnt start

I've tried to use different releases of forge (cu121 and torch21; cu121 and torch231; cu124 and torch24) but i get error on loading flux1-dev-fp8 model Also i tried to change GPU Weights or swap location and it doesnt change anything Log while trying to generate "a dog"

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-323-g72ab92f8
Commit hash: 72ab92f83e5a9e193726313c6d88ab435a61fb59
Launching Web UI with arguments: --skip-torch-cuda-test --listen
Total VRAM 8192 MB, total RAM 32686 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce GTX 1080 : native
VAE dtype preferences: [torch.float32] -> torch.float32
CUDA Using Stream: False
G:\FLUXAI121\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: G:\FLUXAI121\webui\models\ControlNetPreprocessor
2024-08-18 14:01:30,087 - ControlNet - INFO - ControlNet UI callback registered.
Checkpoint flux1-dev-fp82.safetensors not found; loading fallback flux1-dev-fp8.safetensors
Model selected: {'checkpoint_info': {'filename': 'G:\\FLUXAI121\\webui\\models\\Stable-diffusion\\flux1-dev-fp8.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': None}
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 22.4s (prepare environment: 1.0s, import torch: 7.7s, initialize shared: 0.3s, other imports: 1.2s, list SD models: 2.1s, load scripts: 2.5s, create ui: 2.4s, gradio launch: 5.2s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Model selected: {'checkpoint_info': {'filename': 'G:\\FLUXAI121\\webui\\models\\Stable-diffusion\\flux1-dev-fp8.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': None}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': True}
Loading Model: {'checkpoint_info': {'filename': 'G:\\FLUXAI121\\webui\\models\\Stable-diffusion\\flux1-dev-fp8.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: torch.float8_e4m3fn
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.float16}
Calculating sha256 for G:\FLUXAI121\webui\models\Stable-diffusion\flux1-dev-fp8.safetensors: 275ef623d3bbccddb75b66fb549a7878da78e3a201374b73cee76981cb84551c
Model loaded in 0.7s (unload existing model: 0.1s, forge model load: 0.5s).
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
[Unload] Trying to free 7725.00 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 7175.00 MB
[Memory Management] Required Model Memory: 5154.62 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 996.38 MB
Moving model(s) has taken 1.84 seconds
Distilled CFG Scale: 3.5
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 16045.33 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 1893.48 MB ...
[Unload] Unload model JointTextEncoder
Для продолжения нажмите любую клавишу . . .  //Press any key to countinue

Sometimes i can get this error "The instruction at 0x00007FF.... accessed the memory at 0x000000.... The memory cannot be read." Any ideas how to solve this?

Aug 18 '24 08:08 Lolikcrafter

Same issue

Aug 18 '24 08:08 SirVirgo

I doubt you can load fp8 on 8GB of VRAM. The message says its trying to free 16 more GB after you had 1 GB left. That means if you don't have another extra 16GB of RAM free it will never load.

For reference, using fp16, i use roughly 20GB VRAM and 40GB of RAM to generate 1 photo. I doubt fp8 can run on 8 GB.

Aug 18 '24 10:08 derpina-ai

My advice would be to try installing Fp4 of NF4 if you can, those might work on your setup.

Aug 18 '24 10:08 derpina-ai

My advice would be to try installing Fp4 of NF4 if you can, those might work on your setup.

Facing same issue with nf4.

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-323-g72ab92f8
Commit hash: 72ab92f83e5a9e193726313c6d88ab435a61fb59
Launching Web UI with arguments: --skip-torch-cuda-test --listen
Total VRAM 8192 MB, total RAM 32686 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce GTX 1080 : native
VAE dtype preferences: [torch.float32] -> torch.float32
CUDA Using Stream: False
G:\FLUXAI121\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: G:\FLUXAI121\webui\models\ControlNetPreprocessor
2024-08-18 17:10:42,392 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'G:\\FLUXAI121\\webui\\models\\Stable-diffusion\\flux1-dev-fp8.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': None}
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 23.6s (prepare environment: 1.1s, import torch: 10.1s, initialize shared: 0.3s, other imports: 1.3s, list SD models: 0.4s, load scripts: 2.7s, create ui: 2.5s, gradio launch: 5.1s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Model selected: {'checkpoint_info': {'filename': 'G:\\FLUXAI121\\webui\\models\\Stable-diffusion\\flux1-dev-bnb-nf4.safetensors', 'hash': '0184473b'}, 'additional_modules': [], 'unet_storage_dtype': None}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': True}
Loading Model: {'checkpoint_info': {'filename': 'G:\\FLUXAI121\\webui\\models\\Stable-diffusion\\flux1-dev-bnb-nf4.safetensors', 'hash': '0184473b'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
StateDict Keys: {'transformer': 2350, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: nf4
Using pre-quant state dict!
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.float16}
Model loaded in 20.4s (unload existing model: 0.2s, forge model load: 20.3s).
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
[Unload] Trying to free 7725.00 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 7175.00 MB
[Memory Management] Required Model Memory: 5154.62 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 996.38 MB
Moving model(s) has taken 27.99 seconds
Distilled CFG Scale: 3.5
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 9411.13 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 1682.14 MB ...
[Unload] Unload model JointTextEncoder
Для продолжения нажмите любую клавишу . . .   // Press any key to continue

Where can I download fp4? Tried to search, but nothing found

Aug 18 '24 11:08 Lolikcrafter

Try to lower GPU weights to around 4000 MB, use Shared swap location, and Queue Swap method. If you have enough RAM, and you should, it should work well.

https://civitai.com/models/630820?modelVersionId=734260 FP4, also NF4 and others available there. try to use UNET models and load clip/t5/ae separately.

Aug 18 '24 11:08 derpina-ai

What is your Gpu weight in the UI??

On 18 Aug 2024, at 14:15, Lolikcrafter @.***> wrote:

Trying to free 953674316406250018963456.00 MB

Aug 18 '24 12:08 derpina-ai

What is your Gpu weight in the UI?? … On 18 Aug 2024, at 14:15, Lolikcrafter @.***> wrote: Trying to free 953674316406250018963456.00 MB

Default settings, something around 7167MB

Aug 18 '24 12:08 Lolikcrafter

What is your Gpu weight in the UI?? … On 18 Aug 2024, at 14:15, Lolikcrafter @.***> wrote: Trying to free 953674316406250018963456.00 MB

Default settings, something around 7167MB

If you don't want to lower it and test, I can't help you further. The settings I recommended above should work. What is happening is you are using all your VRAM to load the checkpoint and you have none left to do any rendering so it fails at the end when the system tries to regain the VRAM

Also, that's not "default", my default was 23500 MB, and that setting wasn't in any way useful for generating images.

Aug 18 '24 15:08 derpina-ai

What is your Gpu weight in the UI?? … On 18 Aug 2024, at 14:15, Lolikcrafter @.***> wrote: Trying to free 953674316406250018963456.00 MB

Default settings, something around 7167MB

If you don't want to lower it and test, I can't help you further. The settings I recommended above should work. What is happening is you are using all your VRAM to load the checkpoint and you have none left to do any rendering so it fails at the end when the system tries to regain the VRAM

Also, that's not "default", my default was 23500 MB, and that setting wasn't in any way useful for generating images.

I've tested your settings for nf4 and now it works. Is about 30-40 sec/it a normal speed for flux?

Aug 18 '24 15:08 Lolikcrafter

Considering your GPU model it’s a worthy achievement imo. Glad you got it working!On 18 Aug 2024, at 18:38, Lolikcrafter @.***> wrote:

What is your Gpu weight in the UI?? … On 18 Aug 2024, at 14:15, Lolikcrafter @.***> wrote: Trying to free 953674316406250018963456.00 MB

Default settings, something around 7167MB

If you don't want to lower it and test, I can't help you further. The settings I recommended above should work. What is happening is you are using all your VRAM to load the checkpoint and you have none left to do any rendering so it fails at the end when the system tries to regain the VRAM Also, that's not "default", my default was 23500 MB, and that setting wasn't in any way useful for generating images.

I've tested your settings for nf4 and now it works. Is about 30-40 sec/it a normal speed for flux?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

Aug 18 '24 19:08 derpina-ai