ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Both Flux Schnell and Flux Dev crashes ComfyUI

Open aesxsc opened this issue 1 year ago • 41 comments

Expected Behavior

To not crash.

Actual Behavior

ComfyUI crashes after 5-10 seconds i click queue prompt while using Flux Schnell. Such behavior does not happen with the Flux Dev model.

Steps to Reproduce

Use Flux Schnell model. image

Debug Logs

Total VRAM 16376 MB, total RAM 32658 MB
pytorch version: 2.4.0+cu121
C:\Users\xscri\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
C:\Users\xscri\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
xformers version: 0.0.27.post2
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4070 Ti SUPER : cudaMallocAsync
Using xformers cross attention
[Prompt Server] web root: C:\Users\xscri\ComfyUI\web
Successfully imported spandrel_extra_arches: support for non commercial upscale models.
C:\Users\xscri\stable-diffusion-webui\venv\lib\site-packages\kornia\feature\lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)

Import times for custom nodes:
   0.0 seconds: C:\Users\xscri\ComfyUI\custom_nodes\websocket_image_save.py

Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
*crash*

Other

I'm using the venv from Stable Diffusion WebUI.

Nvidia GRD 560.81

aesxsc avatar Aug 07 '24 12:08 aesxsc

Now Flux Dev doesn't work too. They both crash the ComfyUI

aesxsc avatar Aug 07 '24 15:08 aesxsc

They both crash for me. I'm also usign the venv from stableDiffusion webUi...

image

GabrielLanghans avatar Aug 07 '24 17:08 GabrielLanghans

I've created a new venv and installed everything from scratch and got the same error.

GabrielLanghans avatar Aug 07 '24 17:08 GabrielLanghans

have you tried downgrading torch?

TingTingin avatar Aug 07 '24 17:08 TingTingin

What's the recomended version?

GabrielLanghans avatar Aug 07 '24 17:08 GabrielLanghans

They both crash for me. I'm also usign the venv from stableDiffusion webUi...

image

We both have 4070 Ti SUPER's, maybe that's the problem? I also tried in Arch Linux. It crashes the whole DE.

aesxsc avatar Aug 07 '24 20:08 aesxsc

Possibly not enough RAM/swap/pagefile to load the model at the given precision? Usually when a process is "Killed" in Linux, it's to prevent an out of memory situation that would lock the system up. I'd suggest looking into creating or enlarging a swap file. https://wiki.archlinux.org/title/Swap#Swap_file_creation

Try 8GB and see if that's enough, then go up by 4GB until the python process isn't killed on loading.

rabidcopy avatar Aug 08 '24 00:08 rabidcopy

It used to work just fine a few days ago. Also I have a 32GB swapfile + 32GB RAM so I don't think it is the case. On Windows I have the pagefile set to Auto, which I don't think matters again.

aesxsc avatar Aug 08 '24 00:08 aesxsc

Then I can only suggest running git reflog and go back on commits until it works again. It should be fairly easy to determine which commit issues started.

rabidcopy avatar Aug 08 '24 00:08 rabidcopy

Alternatively the problem may be occurring by being on a commit that came just before https://github.com/comfyanonymous/ComfyUI/commit/b334605a6631c12bbe7b3aff6d77526f47acdf42 as this commit addresses OOMs dealing with erroneous model loading.

rabidcopy avatar Aug 08 '24 00:08 rabidcopy

I pulled the latest commit, it still happens. Also, I couldn't exactly find which commit actually broke it.

aesxsc avatar Aug 08 '24 01:08 aesxsc

Portable version is broken too.

aesxsc avatar Aug 08 '24 10:08 aesxsc

Both FP16 and FP8? I have only FP16 downloaded. It does not even attempt to load the checkpoint into RAM. (--lowvram)

mcDandy avatar Aug 09 '24 08:08 mcDandy

I have only FP16 too. Haven't tried FP8.

aesxsc avatar Aug 09 '24 12:08 aesxsc

Other Stable Diffusion models don't crash Comfy, only Flux models crash it.

aesxsc avatar Aug 09 '24 12:08 aesxsc

Same problem. Disabling custom nodes does nothing so I copied output with the nodes active which contains environment info.

[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2024-08-09 17:42:52.116158
** Platform: Windows
** Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
** Python executable: F:\stability\Data\Packages\ComfyUI\venv\Scripts\python.exe
** ComfyUI Path: F:\stability\Data\Packages\ComfyUI
** Log path: F:\stability\Data\Packages\ComfyUI\comfyui.log

Prestartup times for custom nodes:
   4.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 12282 MB, total RAM 32468 MB
pytorch version: 2.1.2+cu121
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4080 Laptop GPU : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: F:\stability\Data\Packages\ComfyUI\web
Adding extra search path checkpoints F:\stability\Data\Models\StableDiffusion
Adding extra search path vae F:\stability\Data\Models\VAE
Adding extra search path loras F:\stability\Data\Models\Lora
Adding extra search path loras F:\stability\Data\Models\LyCORIS
Adding extra search path upscale_models F:\stability\Data\Models\ESRGAN
Adding extra search path upscale_models F:\stability\Data\Models\RealESRGAN
Adding extra search path upscale_models F:\stability\Data\Models\SwinIR
Adding extra search path embeddings F:\stability\Data\Models\TextualInversion
Adding extra search path hypernetworks F:\stability\Data\Models\Hypernetwork
Adding extra search path controlnet F:\stability\Data\Models\ControlNet
Adding extra search path controlnet F:\stability\Data\Models\T2IAdapter
Adding extra search path clip F:\stability\Data\Models\CLIP
Adding extra search path clip_vision F:\stability\Data\Models\InvokeClipVision
Adding extra search path diffusers F:\stability\Data\Models\Diffusers
Adding extra search path gligen F:\stability\Data\Models\GLIGEN
Adding extra search path vae_approx F:\stability\Data\Models\ApproxVAE
Adding extra search path ipadapter F:\stability\Data\Models\IpAdapter
Adding extra search path ipadapter F:\stability\Data\Models\InvokeIpAdapters15
Adding extra search path ipadapter F:\stability\Data\Models\InvokeIpAdaptersXl
Adding extra search path prompt_expansion F:\stability\Data\Models\PromptExpansion
[Crystools INFO] Crystools version: 1.16.6
[Crystools INFO] CPU: 13th Gen Intel(R) Core(TM) i9-13950HX - Arch: AMD64 - OS: Windows 10
[Crystools INFO] Pynvml (Nvidia) initialized.
[Crystools INFO] GPU/s:
[Crystools INFO] 0) NVIDIA GeForce RTX 4080 Laptop GPU
[Crystools INFO] NVIDIA Driver: 560.81
[inference_core_nodes.controlnet_preprocessors] | INFO -> Using ckpts path: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Inference-Core-Nodes\src\inference_core_nodes\controlnet_preprocessors\ckpts
[inference_core_nodes.controlnet_preprocessors] | INFO -> Using symlinks: False
[inference_core_nodes.controlnet_preprocessors] | INFO -> Using ort providers: ['CUDAExecutionProvider', 'DirectMLExecutionProvider', 'OpenVINOExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider', 'CoreMLExecutionProvider']
DWPose: Onnxruntime with acceleration providers detected
F:\stability\Data\Packages\ComfyUI\venv\lib\site-packages\diffusers\models\transformers\transformer_2d.py:34: FutureWarning: `Transformer2DModelOutput` is deprecated and will be removed in version 1.0.0. Importing `Transformer2DModelOutput` from `diffusers.models.transformer_2d` is deprecated and this will be removed in a future version. Please use `from diffusers.models.modeling_outputs import Transformer2DModelOutput`, instead.
  deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
### Loading: ComfyUI-Manager (V2.48.6)
### ComfyUI Revision: 2504 [55ad9d5f] | Released on '2024-08-09'
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
Use STYLE(weight_interpretation, normalization) at the start of a prompt to use advanced encodings
Weight interpretations available: comfy,perp
Normalization types available: none
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[comfyui_controlnet_aux] | INFO -> Using ckpts path: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui_controlnet_aux\ckpts
[comfyui_controlnet_aux] | INFO -> Using symlinks: False
[comfyui_controlnet_aux] | INFO -> Using ort providers: ['CUDAExecutionProvider', 'DirectMLExecutionProvider', 'OpenVINOExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider', 'CoreMLExecutionProvider']
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json

Import times for custom nodes:
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\websocket_image_save.py
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\sd-dynamic-thresholding
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui-inpaint-nodes
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyMath
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui-tooling-nodes
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI_IPAdapter_plus
   0.0 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI_ExtraModels
   0.1 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui_controlnet_aux
   0.1 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\comfyui-prompt-control
   0.4 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI_TensorRT
   0.5 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Crystools
   1.7 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Manager
   1.9 seconds: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Inference-Core-Nodes

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: F:\stability\Data\Packages\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
got prompt

It does not continue. output form nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.81                 Driver Version: 560.81         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...  WDDM  |   00000000:01:00.0  On |                  N/A |
| N/A   44C    P8              4W /  175W |    1031MiB /  12282MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1892    C+G   C:\Windows\explorer.exe                     N/A      |
|    0   N/A  N/A      3588    C+G   ...n\126.0.2592.113\msedgewebview2.exe      N/A      |
|    0   N/A  N/A      8024    C+G   ...0.0_x64__cv1g1gvanyjgm\WhatsApp.exe      N/A      |
|    0   N/A  N/A      8332    C+G   ...ekyb3d8bbwe\PhoneExperienceHost.exe      N/A      |
|    0   N/A  N/A     10628    C+G   ...ft Office\root\Office16\OUTLOOK.EXE      N/A      |
|    0   N/A  N/A     10908    C+G   ...al\Discord\app-1.0.9157\Discord.exe      N/A      |
|    0   N/A  N/A     10912    C+G   ...CBS_cw5n1h2txyewy\TextInputHost.exe      N/A      |
|    0   N/A  N/A     11044    C+G   ....0_x64__kzh8wxbdkxb8p\DCv2\DCv2.exe      N/A      |
|    0   N/A  N/A     11528    C+G   ...5n1h2txyewy\ShellExperienceHost.exe      N/A      |
|    0   N/A  N/A     12700    C+G   ...nt.CBS_cw5n1h2txyewy\SearchHost.exe      N/A      |
|    0   N/A  N/A     12728    C+G   ...2txyewy\StartMenuExperienceHost.exe      N/A      |
|    0   N/A  N/A     14148    C+G   F:\stability\StabilityMatrix.exe            N/A      |
|    0   N/A  N/A     15608    C+G   ...n\126.0.2592.113\msedgewebview2.exe      N/A      |
|    0   N/A  N/A     15680    C+G   ...__8wekyb3d8bbwe\Notepad\Notepad.exe      N/A      |
|    0   N/A  N/A     16320    C+G   ...n\NVIDIA app\CEF\NVIDIA Overlay.exe      N/A      |
|    0   N/A  N/A     18692    C+G   ...ys\WinUI3Apps\PowerToys.Peek.UI.exe      N/A      |
|    0   N/A  N/A     18836    C+G   ...werToys\PowerToys.PowerLauncher.exe      N/A      |
|    0   N/A  N/A     19384    C+G   ...werToys\PowerToys.ColorPickerUI.exe      N/A      |
|    0   N/A  N/A     19800    C+G   ...__8wekyb3d8bbwe\WindowsTerminal.exe      N/A      |
|    0   N/A  N/A     22020    C+G   ...\cef\cef.win7x64\steamwebhelper.exe      N/A      |
|    0   N/A  N/A     22184    C+G   ...les\Microsoft OneDrive\OneDrive.exe      N/A      |
|    0   N/A  N/A     23308    C+G   ...m Files (x86)\Overwolf\Overwolf.exe      N/A      |
|    0   N/A  N/A     23604    C+G   ...rwolf\0.256.0.2\OverwolfBrowser.exe      N/A      |
|    0   N/A  N/A     24508    C+G   ...ress\CefSharp.BrowserSubprocess.exe      N/A      |
|    0   N/A  N/A     24684    C+G   ...les\Microsoft OneDrive\OneDrive.exe      N/A      |
|    0   N/A  N/A     25000    C+G   ...\cef\cef.win7x64\steamwebhelper.exe      N/A      |
|    0   N/A  N/A     26256    C+G   ...crosoft\Edge\Application\msedge.exe      N/A      |
|    0   N/A  N/A     30260    C+G   ...oogle\Chrome\Application\chrome.exe      N/A      |
+-----------------------------------------------------------------------------------------+

mcDandy avatar Aug 09 '24 16:08 mcDandy

duplicate of https://github.com/comfyanonymous/ComfyUI/issues/4198

geroldmeisinger avatar Aug 09 '24 16:08 geroldmeisinger

Also rarely, not only ComfyUI but the whole GPU, Chrome, and CUDA runtime?(nvidia-smi does not work) crashes with it.

OS: Windows 11 22635

Happens with Arch too.

aesxsc avatar Aug 09 '24 21:08 aesxsc

Only confyUI for me crashes. Nothing else is happening. Not even elevated RAM or VRAM usage.

mcDandy avatar Aug 10 '24 07:08 mcDandy

Ran confyui through Nsight systems.

Logs: https://drive.google.com/file/d/1mEIIkvHAykUCHl_cFJt3AmMENlt_zWtQ/view?usp=sharing https://drive.google.com/file/d/10HXsy0A96zMALsPYRib59UrvSUSWkh_J/view?usp=drive_link

mcDandy avatar Aug 10 '24 12:08 mcDandy

Problem will be with the drivers. Does not work on 560.81; worked in 560.70.

Edit: It is not Confyui or Nvidia drivers. Downgraded both and still does not work.

mcDandy avatar Aug 10 '24 12:08 mcDandy

So it was fixed to me by moving my pagefile so it is both on C: to F: (Both on the same SSD).

mcDandy avatar Aug 10 '24 15:08 mcDandy

I might have a Solution for everyone. (at least it woked out for me).

After struggling now for weeks i tryed out an even older versions for my GPU because i have seen, that these kind of problems are mainly on people with a "RTX 4070ti Super 16Gb". I was even about to send it back cause i tryed EVERYTHING...

However - with the Version: 551.23 for this GPU (See picture) my problems got solved!!!! Screenshot 2024-08-10 172750

I really hope this Version is fixing also your problems. :)

sabum6800 avatar Aug 10 '24 16:08 sabum6800

I might have a Solution for everyone. (at least it woked out for me).

After struggling now for weeks i tryed out an even older versions for my GPU because i have seen, that these kind of problems are mainly on people with a "RTX 4070ti Super 16Gb". I was even about to send it back cause i tryed EVERYTHING...

However - with the Version: 551.23 for this GPU (See picture) my problems got solved!!!! Screenshot 2024-08-10 172750

I really hope this Version is fixing also your problems. :)

I just got this card a week ago and I installed 560.70. Then after few days 560.81 got released and it still worked. After 1-2 days it completely stopped working on Flux models. I will to try to roll back to 560.70 and see if that was the problem.

aesxsc avatar Aug 10 '24 21:08 aesxsc

Switched back to 560.70, doesn't work. Now I'll try 551.23 as suggested by @sabum6800 . Also why this specific version? Did you try every other version after that?

aesxsc avatar Aug 10 '24 22:08 aesxsc

And... nope. Switched to 551.23, nothing really changed. Still crashes the same way.

aesxsc avatar Aug 10 '24 23:08 aesxsc

It interestingly works on Arch Linux right now, latest drivers, latest commit.

aesxsc avatar Aug 11 '24 12:08 aesxsc

i have a 2070 super 8gb vram and 32gb ram, latest drivers 560.81. comfy ui crashes after "got prompt", for flux dev, shnell and the FP8 versions. other models works for me. I have no idea on how to get the logs though.

i tried the portable and manual installations of comfyui, both have the same issue

cherryboio avatar Aug 11 '24 15:08 cherryboio

Possibly not enough RAM/swap/pagefile to load the model at the given precision? Usually when a process is "Killed" in Linux, it's to prevent an out of memory situation that would lock the system up. I'd suggest looking into creating or enlarging a swap file. https://wiki.archlinux.org/title/Swap#Swap_file_creation

Try 8GB and see if that's enough, then go up by 4GB until the python process isn't killed on loading.

Thanks for mentioning this. I have a new Arch install myself and never allocated a swapfile. This fixed my issue that appears similar to this.

odysseyalive avatar Aug 11 '24 15:08 odysseyalive

i have a 2070 super 8gb vram and 32gb ram, latest drivers 560.81. comfy ui crashes after "got prompt", for flux dev, shnell and the FP8 versions. other models works for me. I have no idea on how to get the logs though.

i tried the portable and manual installations of comfyui, both have the same issue

Yeah, it still doesn't work in Windows.

aesxsc avatar Aug 11 '24 16:08 aesxsc