centreon-plugins icon indicating copy to clipboard operation
centreon-plugins copied to clipboard

(plugin) network::f5::bigip::snmp::plugin - cpu-load mode - Fixes cen…

Open moix opened this issue 1 year ago • 4 comments

Description

As commented in https://github.com/centreon/centreon-plugins/issues/4699, bigip f5 separates TMM (data plane) from management/control plane and cpus are dedicated to one or the other. The criteria (as documented in https://my.f5.com/manage/s/article/K92615205) is (note f5 documentation is wrong and says the opposite - confirmed with bigip f5 support team):

  • odd-numbered index cpus (1,3,5,7...) are dedicated to control plane usage
  • even-numbered index cpus (2,4,6,8,...) are dedicated to data plane usage

The tmm-usage mode already extract data management dedicated cpu usage and other statistics (clients / server connections...) but control management are discarded. This mode aims to extract both cpu types so management plane cpu usage can be monitored as well.

Fixes #4699

Type of change

  • [x] New functionality (non-breaking change)

Target serie

  • [x] (master)

How this pull request can be tested ?

Tested locally in a bigip f5 instance:

$ perl centreon_plugins.pl --plugin network::f5::bigip::snmp::plugin --mode cpu-usage --hostname bigip-1.acme.com --snmp-version 2c --snmp-timeout 3 --snmp-retries 2 --snmp-community mycommkey --snmp-autoreduce --statefile-dir /tmp/ OK: All CPU are ok | 'usage_5s_0'=8%;;;0;100 'usage_1m_0'=7%;;;0;100 'usage_5m_0'=7%;;;0;100 'user_5s_0'=6%;;;0;100 'user_1m_0'=6%;;;0;100 'user_5m_0'=5%;;;0;100 'iowait_5s_0'=0%;;;0;100 'iowait_1m_0'=0%;;;0;100 'iowait_5m_0'=0%;;;0;100 'system_5s_0'=1%;;;0;100 'system_1m_0'=1%;;;0;100 'system_5m_0'=1%;;;0;100 'idle_5s_0'=86%;;;0;100 'idle_1m_0'=86%;;;0;100 'idle_5m_0'=87%;;;0;100 'usage_5s_1'=7%;;;0;100 'usage_1m_1'=7%;;;0;100 'usage_5m_1'=7%;;;0;100 'user_5s_1'=5%;;;0;100 'user_1m_1'=5%;;;0;100 'user_5m_1'=5%;;;0;100 'iowait_5s_1'=0%;;;0;100 'iowait_1m_1'=0%;;;0;100 'iowait_5m_1'=0%;;;0;100 'system_5s_1'=1%;;;0;100 'system_1m_1'=1%;;;0;100 'system_5m_1'=1%;;;0;100 'idle_5s_1'=86%;;;0;100 'idle_1m_1'=87%;;;0;100 'idle_5m_1'=87%;;;0;100 'usage_5s_2'=2%;;;0;100 'usage_1m_2'=3%;;;0;100 'usage_5m_2'=2%;;;0;100 'user_5s_2'=1%;;;0;100 'user_1m_2'=2%;;;0;100 'user_5m_2'=2%;;;0;100 'iowait_5s_2'=0%;;;0;100 'iowait_1m_2'=0%;;;0;100 'iowait_5m_2'=0%;;;0;100 'system_5s_2'=1%;;;0;100 'system_1m_2'=1%;;;0;100 'system_5m_2'=0%;;;0;100 'idle_5s_2'=98%;;;0;100 'idle_1m_2'=97%;;;0;100 'idle_5m_2'=97%;;;0;100 'usage_5s_3'=1%;;;0;100 'usage_1m_3'=2%;;;0;100 'usage_5m_3'=2%;;;0;100 'user_5s_3'=0%;;;0;100 'user_1m_3'=2%;;;0;100 'user_5m_3'=1%;;;0;100 'iowait_5s_3'=0%;;;0;100 'iowait_1m_3'=0%;;;0;100 'iowait_5m_3'=0%;;;0;100 'system_5s_3'=0%;;;0;100 'system_1m_3'=0%;;;0;100 'system_5m_3'=0%;;;0;100 'idle_5s_3'=99%;;;0;100 'idle_1m_3'=97%;;;0;100 'idle_5m_3'=98%;;;0;100 'usage_5s_4'=3%;;;0;100 'usage_1m_4'=3%;;;0;100 'usage_5m_4'=2%;;;0;100 'user_5s_4'=2%;;;0;100 'user_1m_4'=2%;;;0;100 'user_5m_4'=2%;;;0;100 'iowait_5s_4'=0%;;;0;100 'iowait_1m_4'=0%;;;0;100 'iowait_5m_4'=0%;;;0;100 'system_5s_4'=0%;;;0;100 'system_1m_4'=0%;;;0;100 'system_5m_4'=0%;;;0;100 'idle_5s_4'=97%;;;0;100 'idle_1m_4'=97%;;;0;100 'idle_5m_4'=98%;;;0;100 'usage_5s_5'=4%;;;0;100 'usage_1m_5'=3%;;;0;100 'usage_5m_5'=2%;;;0;100 'user_5s_5'=1%;;;0;100 'user_1m_5'=2%;;;0;100 'user_5m_5'=1%;;;0;100 'iowait_5s_5'=0%;;;0;100 'iowait_1m_5'=0%;;;0;100 'iowait_5m_5'=0%;;;0;100 'system_5s_5'=1%;;;0;100 'system_1m_5'=0%;;;0;100 'system_5m_5'=0%;;;0;100 'idle_5s_5'=96%;;;0;100 'idle_1m_5'=97%;;;0;100 'idle_5m_5'=97%;;;0;100 'usage_5s_6'=1%;;;0;100 'usage_1m_6'=3%;;;0;100 'usage_5m_6'=2%;;;0;100 'user_5s_6'=1%;;;0;100 'user_1m_6'=2%;;;0;100 'user_5m_6'=2%;;;0;100 'iowait_5s_6'=0%;;;0;100 'iowait_1m_6'=0%;;;0;100 'iowait_5m_6'=0%;;;0;100 'system_5s_6'=0%;;;0;100 'system_1m_6'=0%;;;0;100 'system_5m_6'=0%;;;0;100 'idle_5s_6'=99%;;;0;100 'idle_1m_6'=97%;;;0;100 'idle_5m_6'=98%;;;0;100 'usage_5s_7'=1%;;;0;100 'usage_1m_7'=2%;;;0;100 'usage_5m_7'=2%;;;0;100 'user_5s_7'=0%;;;0;100 'user_1m_7'=1%;;;0;100 'user_5m_7'=1%;;;0;100 'iowait_5s_7'=0%;;;0;100 'iowait_1m_7'=0%;;;0;100 'iowait_5m_7'=0%;;;0;100 'system_5s_7'=0%;;;0;100 'system_1m_7'=0%;;;0;100 'system_5m_7'=0%;;;0;100 'idle_5s_7'=99%;;;0;100 'idle_1m_7'=98%;;;0;100 'idle_5m_7'=98%;;;0;100

Checklist

Community contributors & Centreon team

  • [x] I have followed the coding style guidelines provided by Centreon
  • [x] I have commented my code, especially new classes, functions or any legacy code modified. (docblock)
  • [x] I have commented my code, especially hard-to-understand areas of the PR.
  • [x] I have rebased my development branch on the base branch (master, maintenance).

moix avatar Oct 16 '23 14:10 moix

Hi @moix,

Thanks for the contribution 💯 (with my apologies for the delay).

Your contribution will be discussed during the next weekly sessions of refinement with the team. Can you provide the result of either snmpwalk -ObentU or the output of the plugin in debug mode for us to set up automated tests ?

Thanks 👍

Regards,

omercier avatar Jan 23 '24 08:01 omercier

Hi @omercier find an attachment with the output in debug mode. thanks! bigip-cpu-usage.debug.txt

moix avatar Feb 13 '24 23:02 moix

Hi @moix, I think this mode lacks of averaging. You say you want to monitor control plane CPU usage but what I see from your attached file is a list of 56 numbered CPU with 5 x 3 metrics per CPU. So it will graph 840 curves. I don't think it really shows me the control plane CPU usage. Nor the data one. I'm expecting only 2 threads of metrics: 1 for the control plane, 1 for the data. And for each: either just the usage, or if it is judged necessary, the user, system, IO wait and idle metrics. Not all of them at the same time as usage = user + system + IO wait. And finally, just chose one sampling, I don't think anyone need 3 sampling for those metrics. At the end I should have either 2 or 8 metrics that I can graph and humanly understand. The code is neet though 👍

cgagnaire avatar Mar 07 '24 14:03 cgagnaire

Hi @cgagnaire thanks for your comment.

Yes probably having an average will give a single metric to visualize that is simpler, but in my case I was actually looking for detailed usage per cpu. The reason is that we were affected by a bug in F5 where the distribution of load across all cpus was not equal and they were affected by a bug that caused certain amount of cpus to be unused completely, here the reference: https://cdn.f5.com/product/bugtracker/ID923221.html

With this mode we can "easily" visualize it (I know is a graph of 28 lines - odds and even numbers assigned for different purposes)

moix avatar Mar 12 '24 11:03 moix