zlb icon indicating copy to clipboard operation
zlb copied to clipboard

CPU pegged at about 85% all the time

Open gtmadev opened this issue 5 years ago • 8 comments

Platform/Version: Debian, ZLB CE version 5.9.1 Single core, 2GB RAM

I noticed that the CPU utilization on Zevenet stays pegged right around 85%. This is even with little or no traffic/activity. Is this something to be expected?

W50MGVtuga

Granted, it's running with only a single core system, so maybe it's just not enough "juice" to really run ZLB. There are about a dozen farms on it, each with maybe 4-6 backends (on avg).

CPU speed is about 2.5(+/-)GHz. I wonder if moving to a faster processor like a 3.7(+/-)GHz would allow me to stick with single core instances.

Thx

gtmadev avatar Jul 05 '19 11:07 gtmadev

You could analyze the CPU usage in the system with top or ps, maybe here you could find the reason of the high cpu usage.

On the other hand, if you are running with a virtual machine ensure that virtual tools additions are properly installed.

A dozen of farms is not a big number, but it depends of the number of concurrent connections that the load balancer is working with. Ttake into account that the web GUI calls HTTPS to the Zevenet API, this can cause high CPU usage at the moment that the load balancer asks for data.

Also, if you are running HTTPS traffic and your load balancer is not managing AES in the CPU processor your CPUs will be very affected too.

As you can see here there are many reasons that could explain why the CPU is high but in all the cases a better inspection is needed.

El vie., 5 jul. 2019 a las 13:55, Mark Sauer ([email protected]) escribió:

Platform/Version: Debian, ZLB CE version 5.9.1 Single core, 2GB RAM

I noticed that the CPU utilization on Zevenet stays pegged right around 85%. This is even with little or no traffic/activity. Is this something to be expected?

[image: W50MGVtuga] https://user-images.githubusercontent.com/30833539/60720881-b42b7580-9f5e-11e9-9970-445acd8c1dec.png

Granted, it's running with only a single core system, so maybe it's just not enough "juice" to really run ZLB. There are about a dozen farms on it, each with maybe 4-6 backends (on avg).

CPU speed is about 2.5(+/-)GHz. I wonder if moving to a faster processor like a 3.7(+/-)GHz would allow me to stick with single core instances.

Thx

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zevenet/zlb/issues/54?email_source=notifications&email_token=AFBQEPBU2NZCBXLYQAKTCZ3P54ZEFA5CNFSM4H6KUW3KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G5RKTAA, or mute the thread https://github.com/notifications/unsubscribe-auth/AFBQEPA2ABJUCWNZQECT6N3P54ZEFANCNFSM4H6KUW3A .

-- Emilio CamposDevelopment and Support Departmentwww.zevenet.com https://www.zevenet.com/skype-redirect/[email protected] https://www.facebook.com/zevenet https://twitter.com/zevenet https://www.linkedin.com/company/zevenet https://github.com/zevenet https://sourceforge.net/projects/zevenet/ [image: Zevenet] https://www.zevenet.com/signature/

DISCLAIMER: This message contains confidential information and is intended only for the individual named. If you are not the named addressee please notify the sender immediately by email if you have received it by mistake and delete it from your system, you should not disseminate, distribute or copy this email in whole or in part.

emiliocampos-zevenet avatar Jul 05 '19 12:07 emiliocampos-zevenet

Thanks for that. I will check the processes and take a deeper look. It is a virtualized system, so I will also check if the virtual tools are there. I don't recall if that was added. Is there a separate repo for that???

And yes, I noticed on reload of the gui, the cpu can spike higher and sometimes refreshing once or twice more will give a more accurate number. But, I also use API calls to get the performance numbers and then those get passed to our monitoring systems. Every now and then, one of the two LBs goes into alarm state though because it hits +90%. I think on one occasion only it actually caused an anomaly with the load balancer and a reboot caused the secondary to take over. After a reboot it was fine.

Edit: Ahh, you are referring to virtual tools.. like for VMWare? It's not a VMWare box. It's on Vultr. So, I think it's KVM, right?

gtmadev avatar Jul 05 '19 12:07 gtmadev

Please review logs /var/log/syslog and check if you stop the monitoring tasks the CPU decreases, in that case, you should find which API is causing the CPU usage. If you let us know we could try to give a solution or work in the way of reducing CPU usage in the given API call.

Maybe if you require to run additional API calls for monitoring purpose then you would require to increase the number of CPUs.

Thanks!

El vie., 5 jul. 2019 a las 14:17, Mark Sauer ([email protected]) escribió:

Thanks for that. I will check the processes and take a deeper look. It is a virtualized system, so I will also check if the virtual tools are there. I don't recall if that was added. Is there a separate repo for that???

And yes, I noticed on reload of the gui, the cpu can spike higher and sometimes refreshing once or twice more will give a more accurate number. But, I also use API calls to get the performance numbers and then those get passed to our monitoring systems. Every now and then, one of the two LBs goes into alarm state though because it hits +90%. I think on one occasion only it actually caused an anomaly with the load balancer and a reboot caused the secondary to take over. After a reboot it was fine.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zevenet/zlb/issues/54?email_source=notifications&email_token=AFBQEPA25U4G2XZ34MBSBULP543VDA5CNFSM4H6KUW3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZJMDTA#issuecomment-508740044, or mute the thread https://github.com/notifications/unsubscribe-auth/AFBQEPCYBNXPSWPKIB7JDK3P543VDANCNFSM4H6KUW3A .

-- Emilio CamposDevelopment and Support Departmentwww.zevenet.com https://www.zevenet.com/skype-redirect/[email protected] https://www.facebook.com/zevenet https://twitter.com/zevenet https://www.linkedin.com/company/zevenet https://github.com/zevenet https://sourceforge.net/projects/zevenet/ [image: Zevenet] https://www.zevenet.com/signature/

DISCLAIMER: This message contains confidential information and is intended only for the individual named. If you are not the named addressee please notify the sender immediately by email if you have received it by mistake and delete it from your system, you should not disseminate, distribute or copy this email in whole or in part.

emiliocampos-zevenet avatar Jul 05 '19 12:07 emiliocampos-zevenet

I checked cpu. Didn't see anything odd though. In fact, almost no usage. However, the web gui shows CPU at 85%. So.. maybe it really is just showing the "current" number just as the page loads and TLS is used? But still.. seems like a very high number. Keep in mind though.. load shows very low.. maybe just 5-6%.

So maybe I am just reading this wrong. I am not sure what it really means when it says 85% cpu. It definitely doesn't seem like it's using 85% of the cpu.

The cpu graph from vultr seems correct. If I run tops like this...

top -bn1 | grep "Cpu(s)" | \ sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \ awk '{print 100 - $1"%"}'

The output is about 6.5%. And that seems to match what I see on the Vultr charts.

dNExM7ZP7w

gtmadev avatar Jul 05 '19 12:07 gtmadev

Ok understood, the web GUI sends many API calls in the dashboard section, so a high load is understood at that graph because it is needed to show all the information.

But after this, the CPU decreases, but the CPU refresh is not possible to be executed from the web GUI again, so you will not see this value updated anymore through the web.

The real CPU usage needs to be confirmed with ps or top, having a look to attached graphs the CPU is down and stable, CPU usage is more accurate if you check the CPU usage in the rrd graphs.

Regads

El vie., 5 jul. 2019 a las 14:53, Mark Sauer ([email protected]) escribió:

I checked cpu. Didn't see anything odd though. In fact, almost no usage. However, the web gui shows CPU at 85%. So.. maybe it really is just showing the "current" number just as the page loads and TLS is used? But still.. seems like a very high number. Keep in mind though.. load shows very low.. maybe just 5-6%.

So maybe I am just reading this wrong. I am not sure what it really means when it says 85% cpu. It definitely doesn't seem like it's using 85% of the cpu.

The cpu graph from vultr seems correct. If I run tops like this...

top -bn1 | grep "Cpu(s)" | \ sed "s/., ([0-9.])% id.*/\1/" | \ awk '{print 100 - $1"%"}'

The output is about 6.5%. And that seems to match what I see on the Vultr charts.

[image: dNExM7ZP7w] https://user-images.githubusercontent.com/30833539/60723140-7251fd80-9f65-11e9-889b-5f1db05e9059.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/zevenet/zlb/issues/54?email_source=notifications&email_token=AFBQEPCKFKF32DN4QBMVGSLP5472ZA5CNFSM4H6KUW3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZJOLMQ#issuecomment-508749234, or mute the thread https://github.com/notifications/unsubscribe-auth/AFBQEPEVBVSN3P4G2T3RWR3P5472ZANCNFSM4H6KUW3A .

-- Emilio CamposDevelopment and Support Departmentwww.zevenet.com https://www.zevenet.com/skype-redirect/[email protected] https://www.facebook.com/zevenet https://twitter.com/zevenet https://www.linkedin.com/company/zevenet https://github.com/zevenet https://sourceforge.net/projects/zevenet/ [image: Zevenet] https://www.zevenet.com/signature/

DISCLAIMER: This message contains confidential information and is intended only for the individual named. If you are not the named addressee please notify the sender immediately by email if you have received it by mistake and delete it from your system, you should not disseminate, distribute or copy this email in whole or in part.

emiliocampos-zevenet avatar Jul 05 '19 12:07 emiliocampos-zevenet

Okay, thanks for that. I am going to also query those values again using the API (manually). An automation does this every minute or so to feed into our monitoring system. But I'd like to take a look at the raw numbers and just confirm what I am expecting to see, Thx again.

gtmadev avatar Jul 05 '19 13:07 gtmadev

Okay.. raw numbers pulled from the stats API look good. I think it's fine. I will have to really rely on those numbers rather than what I am seeing in the GUI.

{ "description": "System stats", "params": { "cpu": { "cores": 1, "idle": 98, "iowait": 0, "irq": 0, "nice": 0, "softirq": 1, "sys": 0, "usage": 2, "user": 1 }, "date": "Fri Jul 5 06:36:58 2019", "hostname": "lb1", "load": { "last_1": 0.32, "last_15": 0.03, "last_5": 0.1 }, "memory": { "Buffers": 258.34, "Cached": 220.5, "MemFree": 1102.93, "MemTotal": 1995.31, "MemUsed": 892.39, "usage": 44.7, "SwapCached": 0, "SwapFree": 2045, "SwapTotal": 2045, "SwapUsed": 0 }, "network": { "eth0 in in": 114.53, "eth0 out out": 182.59, "eth1 in in": 9597.01, "eth1 out out": 8467.15 } } }

Thanks again for your assistance and feedback.

gtmadev avatar Jul 05 '19 13:07 gtmadev

Thanks for your confirmation, we will evaluate how to improve the GUI CPU stats for a future version.

cano-devel avatar Jul 12 '19 10:07 cano-devel