coolify icon indicating copy to clipboard operation
coolify copied to clipboard

[Bug]: Unexplained High CPU Usage Spike in Coolify Following Deployment Attempt

Open galacoder opened this issue 9 months ago • 109 comments

Description

I have been running Coolify for 8 days with various services, encountering no prior issues. However, on the night of April 30th EST, I experienced a significant CPU usage spike starting around 11 PM, shortly after an unsuccessful attempt to deploy a React application. It is unclear whether this issue was directly related to the deployment failure, a potential attack, or another problem.

Expected Behavior

CPU usage should remain stable, without significant spikes, particularly when no active deployments or heavy tasks are underway.

Actual Behavior

CPU usage unexpectedly spiked to over 300% and remained high throughout the night, which was unusual and concerning, given the context.

Environment

  • Coolify version: 4.0.0-beta.271
  • Docker version: Docker version 24.0.9, build 2936816
  • Operating System: Ubuntu 22.04.4 LTS
  • Hardware: VPS with 4 CPUs and 16GB RAM

Additional Context

The sudden surge in CPU usage occurred post the deployment failure, but it is uncertain if the spike was a direct result of this event, a security issue, or another underlying problem. This incident warrants further investigation to prevent future occurrences.

I already tried to restart my VPS 2 times, but the problem still insisted.

I would appreciate any insights or troubleshooting steps you could recommend to help identify and resolve the root cause of this spike. Thank you for your assistance.

Minimal Reproduction (if possible, example repository)

Steps to Reproduce

  1. Set up and run Coolify with multiple services for over a week.
  2. Attempt to deploy a React app, which fails.
  3. Monitor CPU usage and observe an unexpected spike starting around 11 PM EST, continuing without additional user actions.

Exception or Error

Screenshots

Screenshot 2024-05-01 at 23 11 05 Screenshot 2024-05-01 at 23 11 33

Version

4.0.0-beta.271

galacoder avatar May 02 '24 03:05 galacoder

Hello, I don't have many details to share right now we installed coolify on some small hetzner servers yesterday - it was working like a breeze yesterday but today every deployment takes over all resources for several minutes with CPU usage just above 200% and very high IO ops on the harddrive.

I'm currently trying to figure out what might be causing this - if a change in my docker container is responsible - but currently waiting for the server to be available again.

image Last 24 hours

image Last hour

The hetzner console tells me it's out of memory - the server is one of the smallest available with just 4 GB - but the same process worked fine yesterday

image

UPDATE: restarting the hetzner server fixed the issue for me (for now) - i hope it doesnt happen again

marwie avatar May 09 '24 09:05 marwie

Second this. As of today, I have the same issue. Fresh new Coolify installation, fresh new Contabo server (4vCPU, 6GB RAM). Takes up 100% of CPU. A small nodejs backend, which takes 5s to build on my machine, has been building for 10 minutes and counting.

v4.0.0-beta.294

root@vmi1916516:~# mpstat -u | awk '/all/ {printf "%.2f%%\n", 100-$12}'
100.00%

Don't know if it's normal, but php 8.2 and php-fpm often take up to 70% of my CPU when navigating Coolify. And not just for a split second, but steadily.

image


The usage jumps from one thing to the other, all while building a nodejs server with 200 lines of code...

image

image


I was planning to build my backend on Coolify. Do I just ditch it now or what?

AspireOne avatar Jun 05 '24 13:06 AspireOne

similar issue coolify taking up large amounts of cpu and fluctuates on command /init

Screenshot 2024-06-05 at 9 06 06 PM Screenshot 2024-06-05 at 10 12 29 PM

marke-dev avatar Jun 06 '24 02:06 marke-dev

same here.

BTW: would be awesome if we could get an stat overview about the running containers directly on coolify to see theirs CPU and Memory usage.

swissbyte avatar Jun 12 '24 14:06 swissbyte

Seeing a similar issue on v4.0.0-beta.297.

jamesryancooper avatar Jun 15 '24 00:06 jamesryancooper

Same issue. Worked fine for a few days and now started to hang unresponsive on NextJS deployment.

Coolify: v4.0.0-beta.297 hetzner: CPX11 | x86 | 40 GB | us-west

image

mpanibrat avatar Jun 15 '24 07:06 mpanibrat

Sometimes it helps to restart coolify or the host server. Then its ok again for about 1-2h

swissbyte avatar Jun 15 '24 10:06 swissbyte

Check if you have enough ram. In my case swapd (which is responsible for using swap memory) would take all the CPU and adding more ram fixed it

Nedi11 avatar Jun 15 '24 10:06 Nedi11

I do have enough free ram. Its also not the swapd process that eats up all the cpu. Thanks for the proposal

swissbyte avatar Jun 15 '24 10:06 swissbyte

I am having the same issue. I have three servers running, a Coolify server, a build server and a server just to run the containers (2 nextjs apps). It's always the server running the containers that goes down.

image

This is the chart from the most recent crash, it's particularly weird because no deployments were happening at the time and I can't see any traffic spikes either, just seemed to be random.

ck-euan avatar Jun 18 '24 08:06 ck-euan

@andrasbacsai is it safe to rollback to from 297 to 4.0.0 296 for example? Cause the high CPU is making my prod environment nearly un usable... Yes, i know, i learned my lesson the hard way. never enable auto updates on prod systems...

swissbyte avatar Jun 19 '24 17:06 swissbyte

same issue!!

CleanShot-Server-Nutzung  Hostinger-Google Chrome-2024-06-21 at 00 50 18@2x

atilladeniz avatar Jun 20 '24 22:06 atilladeniz

My coolify docker container shows also as „unhealthy“

swissbyte avatar Jun 21 '24 05:06 swissbyte

Same here v4.0.0-beta.297 image

image

last night i had a failed nextjs deployment but the high cpu only started like 10 hours later

Edit: Tried to bash into the coolify container: i can bash but once i'm in, any command hangs forever. even pwd. same happens with the coolify-db container

I restarted the coolify container.. let's see if the problem appears again.

johnpccd avatar Jun 23 '24 10:06 johnpccd

Deployment of a service or also redeployment takes around 10-15 minutes. The same service was redeployed within 15-50 seconds before…

swissbyte avatar Jun 23 '24 10:06 swissbyte

I reinstall it with Ubuntu 20.04 now it works fine.. Ubuntu 22.04 and 24.04 not working for me! Another Server I use Coolify with Debian 11 and it's better!

atilladeniz avatar Jun 23 '24 10:06 atilladeniz

Interesting. Have you tried 22.04 and saw high CPU and then 24.04 as well?

or in other words… is it reproducable?

swissbyte avatar Jun 23 '24 11:06 swissbyte

yes i try both versions 22.04 and 24.04 both the same 100% CPU High Usage issue! only on Ubuntu 20.04 and Debian 11 is good!

atilladeniz avatar Jun 23 '24 12:06 atilladeniz

I have Ubuntu 20.04 and still have this issue. Every week coolify will fail and I have to restart the server. It's just coolify.

marke-dev avatar Jun 23 '24 14:06 marke-dev

What happens if we limit the cpu usage of the coolify container?

swissbyte avatar Jun 23 '24 14:06 swissbyte

Hey Guys.... v. 298 is out now :) https://github.com/coollabsio/coolify/releases/tag/v4.0.0-beta.298 At least on my side, it seems to not really change the CPU behaviour dramatically... How about you?

swissbyte avatar Jun 24 '24 13:06 swissbyte

For all of you with 100% CPU Issue is anybody use supabase? because I install everything again only not supabase and have no issues! when i install supabase it happens again with the high 100% Usage..

atilladeniz avatar Jun 24 '24 21:06 atilladeniz

I have had nothing installed or running at one point, other than coolify and still had spikes

marke-dev avatar Jun 24 '24 21:06 marke-dev

whats vps provider you use?

atilladeniz avatar Jun 24 '24 21:06 atilladeniz

For all of you with 100% CPU Issue is anybody use supabase? because I install everything again only not supabase and have no issues! when i install supabase it happens again with the high 100% Usage..

i don't have supabase, and i saw it once

johnpccd avatar Jun 24 '24 21:06 johnpccd

Not using supabase hosting on hetzner @atilladeniz

marwie avatar Jun 24 '24 21:06 marwie

very strange! it's spooky this problem! can not sleep a few days well beause of heart attack every second the server can goes to 100% and slows my connections and latency on the vps

cpulimit not works for me ! it goes up always !

atilladeniz avatar Jun 24 '24 21:06 atilladeniz

@marke-dev can you send the logs of the docker container coolify i want look through it.. maybe we find the bug fix together!

atilladeniz avatar Jun 24 '24 21:06 atilladeniz

@atilladeniz can you check if the cpu spike coincided with a database backup?

johnpccd avatar Jun 24 '24 22:06 johnpccd

whats vps provider you use?

I'm not on a VPS but a dedicated server with IONOS

marke-dev avatar Jun 24 '24 22:06 marke-dev