Flux - CPU stall, Kernel panic, Computer crash, ComfyUI killed
Expected Behavior
I'm able to generate images with flux-dev and flux-schnell sometime but usually the whole computer crashes or comfy gets killed. I tried flux-dev and flux-schnell default workflows from ComfyUI with t5 fp8 (instead of fp16), everything else on default. ComfyUI was started with --lowvram --disable-all-custom-nodes. The crash usually happens when ComfyUI visually executes ClipTextEncode but when running it on it's own it doesn't seem to be the issue.
Actual Behavior
- ComfyUI executes ClipTextEncode, after a while computer hangs for 3sec, then automatically reboots (with
--lowvram) - ComfyUI Terminal:
Killed(with--normalvram) - Computer stutters for a while but image gets generated
Steps to Reproduce
Guesses
- Out of memory
- Faulty hardware
- CPU or GPU Temperature
- Driver issue
- Comfy out of date
- Model files corrupt
Diagnostics
Kernel panic
This is the most useful log. I opened a root-terminal on Ctrl+Alt+1 and a user-terminal on Ctrl+Alt+2 and desktop on Ctrl+Alt+7.
- On user terminal start ComfyUI
- On desktop queue the prompt then close browser
- On root disable desktop manager
systemctl disable lightdm(saves VRAM and SRAM) - Go back to user terminal and watch ComfyUI
Unfortunately I wasn't able to make good screenshots but here are still frames (sorry for the quality but I had to take a video with smartphone):
I think this is the most relevant log
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: $16-...0: (8 ticks this GP) idle=108c/1/0x40000000000000 softirq=4161/4165 fqs=2396
mce: CPUs not responding to MCE broadcast (may include false positives): 16
Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
...and then it all happened so fast!
Temperature
I tried sensors and i7z but temp is around 40-60°.
Example (idle):
Cpu speed from cpuinfo 2111.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 2111 MHz
CPU Multiplier 21x || Bus clock frequency (BCLK) 100.52 MHz
Socket [0] - [physical cores=16, logical cores=24, max online cores ever=16]
TURBO ENABLED on 16 Cores, Hyper Threading ON
Max Frequency without considering Turbo 2211.52 MHz (100.52 x [22])
Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 Cores is 52x/52x/51x/51x/51x/51x
Real Current Frequency 1654.28 MHz [100.52 x 16.46] (Max of below)
Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % Temp VCore
Core 1 [0]: 1208.93 (12.03x) 1 95.5 0 4.12 34 0.7916
Core 2 [2]: 1008.66 (10.03x) 1 100 0 0 30 0.7916
Core 3 [4]: 1062.96 (10.57x) 1 98.9 0 1 33 0.7916
Core 4 [6]: 1103.47 (10.98x) 1 99.9 0 0 34 0.7916
Core 5 [8]: 1315.81 (13.09x) 8.07 92.1 0 2.85 34 0.7866
Core 6 [10]: 1654.28 (16.46x) 8.72 84.5 0 8.67 32 0.7866
Core 7 [12]: 1151.81 (11.46x) 7.15 90.8 0 5.32 33 0.7863
Core 8 [14]: 1081.30 (10.76x) 1.38 97.3 0 2.02 34 0.7913
Core 9 [16]: 1049.93 (10.44x) 1 5.68 0 94.3 36 0.7863
Core 10 [17]: 1195.70 (11.89x) 1 1.75 0 98.1 36 0.7863
Core 11 [18]: 1080.99 (10.75x) 3.28 12.7 0 85.6 36 0.7863
Core 12 [19]: 1124.96 (11.19x) 1 6.19 0 93.8 36 0.7863
Core 13 [20]: 1148.47 (11.42x) 1 2.13 0 97.4 34 0.7863
Core 14 [21]: 1200.57 (11.94x) 2.89 3.21 0 95.1 34 0.7863
Core 15 [22]: 892.17 (8.88x) 1 2.27 0 97.7 34 0.7863
Core 16 [23]: 1101.13 (10.95x) 1 1.58 0 98.3 34 0.7863
the seconds before disaster strikes
(Please note that I took a screenshot every other second so there is still the possibility of a huge spike just before the crash happens and I didn't catch it.. but there is only one core utilized so this is all unlikely. for the same reason I don't expect power supply to be the issue)
Out of memory
Maybe the system runs into OOM. I tried increasing swap from 1GB to 32GB and 64GB but it didn't help:
sudo dd if=/dev/zero of=/swapfile bs=1M count=32768
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapoff /dev/dm-2
sudo swapon /swapfile
nano /etc/fstab
/swapfile swap swap defaults 0 0
What I noticed however is that with --lowvram the VRAM doesn't get utilized at all (nvidia-smi) before it crashes.
Memtest86+
Maybe the SRAM is faulty? I ran memtest and it passed.
Maybe disk is faulty? TODO
Dmesg and other logs
I looked in various logs but found no useful info => omitted
/var/log/syslog
/var/log/messages
/var/log/kern.log
$ journalctl -b -1
Drivers
sudo apt-get remove --purge nvidia-* libnvidia-* libxnvctrl* cuda*- Reboot
- Install cuda
- Install driver:
apt-get install -y nvidia-open - Reboot
nvidia-smi
Sun Aug 4 10:42:21 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6 |
...
ComfyUI
git pull
# f7a5107784cded39f92a4bb7553507575e78edbe
rm -fR venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Model files
sha256sum ...are all correct.- It works sometime... so they should be fine.
ClipTextEncode
I tried only running ClipTextEncode and hooked it up on ComfyUI-essentials Debug Tensor and this usually works(!)... which suggests the visual information "executing on ClipTextEncode" from ComfyUI might be misleading (it's actually somewhere else already) or there is some interaction when flux and clips are both loaded.
Debug Logs
python main.py --lowvram --disable-all-custom-nodes
Total VRAM 15971 MB, total RAM 31923 MB
pytorch version: 2.4.0+cu121
Set vram state to: LOW_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4060 Ti : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: /home/meisi/dev/ComfyUI/web
Adding extra search path checkpoints ./models/Stable-diffusion
Adding extra search path configs ./models/Stable-diffusion
Adding extra search path vae ./models/VAE
Adding extra search path loras ./models/Lora
Adding extra search path loras ./models/LyCORIS
Adding extra search path upscale_models ./models/ESRGAN
Adding extra search path upscale_models ./models/RealESRGAN
Adding extra search path upscale_models ./models/SwinIR
Adding extra search path embeddings ./embeddings
Adding extra search path hypernetworks ./models/hypernetworks
Adding extra search path controlnet ./models/ControlNet
/home/meisi/dev/ComfyUI/venv/lib/python3.11/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
Skipping loading of custom nodes
Starting server
To see the GUI go to: http://127.0.0.1:8188
got prompt
model_type FLOW
model weight dtype torch.bfloat16, manual cast: None
clip missing: ['text_projection.weight']
Requested to load FluxClipModel_
Loading 1 new model
Killed (or crash)
Other
System
nvidia-smi
cat /proc/version
cat /etc/*-release
System Info
OS: Debian Linux 12.2 (bookworm)
CPU: 13th Gen Intel Core i7-13700F x 16
GPU: NVIDIA GeForce RTX 4060 Ti
VRAM: 16GB
CUDA version: 12.6
Driver version: 560.28.03
SRAM: 32GB
(other notable info: system and comfy runs on NVME disk whereas the models are symlinked at a SATA SSD.)
Additional guesses
I can somewhat consistently generate when I increase swap and run in normal vram mode which suggests a certain system OOM condition:
- in lowvram mode even if swap is increased to 64GB the VRAM is not utilized at all and maybe swap is too slow(?) so the whole system runs into kernel panic
- in normal vram mode if I keep swap at 1GB the VRAM is utilized, but with 1GB swap it might just be at the edge. If some other programs require more RAM comfyui gets killed. another thing I notice is the whole system begins to stutter when I reopen the browser which suggests heavy swapping. if I increase swap to just 4GB I can reopen the browser normally (this is also my solution for now)
have you tried running the model in fp8? at fp16 (default) it's definitely too big for your VRAM and 32 GB SRAM isn't gonna save you either as it has to hold the text encoders, vae and diffusion model. Using swap to extend your SRAM to extend your VRAM is not a good idea imho. If it runs in fp8 then it's likely just a matter of not enough vram/sram and you might need to upgrade to run in fp16. i doubt there's much to be done on the software side, aside from general memory efficiency improvements.
Thanks for the suggestion. I tried --lowvram flux-dev fp8_e4m3fn and --lowvram flux-dev-fp8. Both crash(!). Which invalidates the OOM hypothesis and the faulty swap hypothesis. I'm on 0a6b0081176c6233015ec00d004c534c088ddcb0 now.
enough SRAM and VRAM left, seconds before the crash
Upgraded to 64GB SRAM now, same problem with --lowvram But --normalvram seems to be more stable now, Comfy does not get killed
still happens on 413322645e713bdda69836620a97d4c9ca66b230 with --lowvram
I have same issue. I have AMD Radeon 7800 xt 16GB vram. I have 32GB system ram and it starts using some swap but there's lots free.
I gave tried flux-dev-fp8
I often get hard lock where I need to hold power button to turn off, sometimes I can recover using altgr-rei
I found this to make some pytorch tracing. It should work on Nvidia as well. https://rocmdocs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#pytorch-profiler
I have digged through some error logs and it seems that zram was failing on when I had low system ram. Removing zram and just have a swap-file solved some crashes for me.
The only issue left now is that I get soft-lock with the kernel spamming this line:
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
Then I found this page: https://bugzilla.kernel.org/show_bug.cgi?id=209163#c4
I guess there's some check if vram malloc call failure is not handled correctly. How to do that in python is outside my code-fu :/
Edit 1: I think I solved it using this kernel parameter: https://www.phoronix.com/forums/forum/software/linux-gaming/1304169-dxvk-1-9-4-released-with-better-support-for-god-of-war?p=1304262#post1304262
Edit2: Nope, it still happens, but I was able to do quite a lot of image generations before I got this error.
still happens on 2622c55aff9433d425a62e5f6c379cf22a42139e with --lowvram
Kernel panic means there's something wrong with your driver/OS config or hardware.
I always get OOMs (both vram and ram) sometimes on purpose when testing various things and have never had my kernel panic because of it.
It only happens in lowvram, without it it works. With normalvram I too get OOM all the time without kernel panic.
I just tried reverting to cuda 12.3.2 and Driver Version: 545.23.08 (the oldest update number which supports debian 12). Same issue :/ I also checked the whole hardware for any issues (see first post), unless there is something missing.
sudo apt-get remove --purge nvidia-* libnvidia-* libxnvctrl* cuda*- Reboot
- Install cuda 12.3.2
- Install driver:
sudo apt-get install -y cuda-drivers - Reboot
nvidia-smi
Sun Aug 18 10:44:39 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08 Driver Version: 545.23.08 CUDA Version: 12.3 |
git pull
rm -fR venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python main.py --lowvram --disable-all-custom-nodes
Intel 13th generation CPU bug
BIOS update, Robeytech test (Windows) all positive => same issue
Power consumption
I also ran Prime95+3DMark (Windows) for 5min => stable
did you find something ? I use forge and flux crash my PC or video card driver every second generation and also I need limit VRAM weight to 20G so can generate first image otherwise the forge crash it self also it works fine if I use fp8 model
13900K (I tested everything and it seems fine) 4090 64G
did you find something ? I use forge and flux crash my PC or video card driver every second generation and also I need limit VRAM weight to 20G so can generate first image otherwise the forge crash it self also it works fine if I use fp8 model
no, except it works with --normalvram. for me fp8 doesn't work with --lowvram. so there is something in the lowvram utilization of (any) flux model which causes the kernel to crash.
@rsl8 thanks for the hint btw, apparently I had my linux-image set to manual
Setup a fresh Debian 12.6 system
cat /proc/version
Linux version 6.1.0-23-amd64 ([email protected]) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15)
(this is the latest stable linux kernel in debian)
same issue
echo "deb http://deb.debian.org/debian bookworm-backports main contrib non-free non-free-firmware" | sudo tee -a /etc/apt/sources.list
apt update
apt install -t bookworm-backports linux-image-amd64
# reboot
cat /proc/version
# Linux version 6.9.7+bpo-amd64 ([email protected]) (x86_64-linux-gnu-gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.9.7-1~bpo12+1 (2024-07-03)
same issue
I think I found the problem and may be affected by the Intel Raptor Lake instability and degradation issue due to elevated operating voltage after all.
Before you do anything UPDATE YOUR BIOS OR YOU MAY DAMAGE YOUR CPU!
- I tried the Robeytech test 10x times => passed
- I tried Prime95 for 5min => passed
- Today I also tried Intel Processor Diagnostics tool => HANGS on CPULoad!
Update your BIOS before you do this and make sure it includes something like Update microcode 0x129 to address sporadic Vcore elevation behavior announced by Intel.
The following models are affected:
13th gen:
i9-13900KS
i9-13900K
i9-13900KF
i9-13900F
i9-13900
i7-13700K
i7-13700KF
i7-13790F
i7-13700F
i7-13700
i5-13600K
i5-13600KF
14th gen:
i9-14900KS
i9-14900K
i9-14900KF
i9-14900F
i9-14900
i7-14700K
i7-14700KF
i7-14790F
i7-14700F
i7-14700
i5-14600K
i5-14600KF
Solution
Load a low-voltage profile in UEFI (I never tried this before because I assumed the BIOS defaults are fine):
If I use "e-core disable" I am able to run Flux on
--lowvram. It may be called differently by your mainboard vendor.
@geroldmeisinger I can pass all of the tests
can you describe your symptoms in more detail please!
Recently, when upgrading my computer, I avoided Intel because of that issue. I see that's how the symptoms manifest... You must have gone through a lot of trouble to identify the cause.
You must have gone through a lot of trouble to identify the cause.
Thanks for the empathy! Yes.
I tried multiple CPU benchmarks on Windows 10 with "E-Core disable" and "Spec Enhance" performance profiles:
GeekBench 6 errors and crashes on "Spec Enhance" (blue screen of death "Clock_Watchdog_Timeout"). Performance loss is about -20-25%(!) which is...
I called Intel support and they will exchange my processor.
so after 6 weeks I finally have my CPU replaced and I can confirm flux-dev --lowvram works.
for anyone interested in the Intel Raptor Lake instability and degradation issue due to elevated operating voltage
how long does it take for Intel (germany) to replace a CPU:
me2shop 2024-09-10 2024-09-12
shop2intel 2024-09-12 (assume 2 days)
intel notify 2024-10-15
intel2shop 2024-10-15 2024-10-18
shop2me 2024-10-18 2024-10-19
~8 days for delivery, 31 days for Intel
and there is variation in performance which I cannot explain:
In my case, ComfyUI get killed even with fp8 model. But it will generate image normally if running ComfyUI with --lowvra. The computer have two 4090 cards.