cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Prometheus exporter enhancement

Open soreana opened this issue 4 years ago • 98 comments

Description

In this pull request, I added new functionality to Cloudstack prometheus exporter. To see the differences please check the testing section.

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)

How Has This Been Tested?

This pull request contains seven commits. Except for the dfb35e5224 commit, they are all added new functionality to the Prometheus. In the subsequent sections, I will describe every commit functionality. I tested them in my test environment with three management servers, one DB node (MySQL), and two KVM hypervisor.

1. Export count of total/up/down hosts by tags 0dbe9e78a3660bef73451c6d56f4826509833f2b

  1. Enable Prometheus.
  2. Add tag to the host.
  3. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_hosts_total

Output Before Changes:

cloudstack_hosts_total{zone="mgt122-60",filter="online"} 2
cloudstack_hosts_total{zone="mgt122-60",filter="offline"} 0
cloudstack_hosts_total{zone="mgt122-60",filter="total"} 2

Output After Changes:

cloudstack_hosts_total{zone="mgt122-60",filter="online"} 2
cloudstack_hosts_total{zone="mgt122-60",filter="offline"} 0
cloudstack_hosts_total{zone="mgt122-60",filter="total"} 2
cloudstack_hosts_total{zone="mgt122-60",filter="total",tags="tage1"} 1
cloudstack_hosts_total{zone="mgt122-60",filter="online",tags="tage1"} 1
cloudstack_hosts_total{zone="mgt122-60",filter="offline",tags="tage1"} 0 

2. Export count of vms by state and host tag e6a81d16d9f11db6bb4fd2b0ab38194961ce516b

  1. Enable Prometheus.
  2. Add tag to the host.
  3. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_vms_total_by_tag

After changes, the following line added to the Prometheus output:

cloudstack_vms_total_by_tag{zone="mgt122-60",filter="starting",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="running",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="stopping",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="stopped",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="destroyed",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="expunging",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="migrating",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="error",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="unknown",tags="tage1"} 0
cloudstack_vms_total_by_tag{zone="mgt122-60",filter="shutdown",tags="tage1"} 0

3. Add host tags to host cpu/cores/memory usage in Prometheus exporter eefd9f197352653f74aff73ccfffc4dd86d56b0d

  1. Enable Prometheus.
  2. Add tag to the host.
  3. Run following command and justify output with the expected results. curl http://127.0.0.1:9595/metrics | grep cloudstack_host_vms_cores_total
  4. repeat step three for cloudstack_host_cpu_usage_mhz_total and cloudstack_host_memory_usage_mibs_total

Output Before Changes:

cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="used",dedicated="0"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="total",dedicated="0"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="used",dedicated="0"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="total",dedicated="0"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",filter="allocated"} 4
cloudstack_host_vms_cores_total_by_tag\{zone="mgt122-60",filter="allocated",tags="tage1"} 0

Output After Changes:

cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="used",dedicated="0",tags="tage1"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="total",dedicated="0",tags="tage1"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="used",dedicated="0",tags=""} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="total",dedicated="0",tags=""} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",filter="allocated"} 4 

4. Cloudstack Prometheus exporter: Add allocated capacity group by host tag. a489e3c6b269279df5fbff32a708d9ed0296a40e

  1. Enable Prometheus.
  2. Add tag to the host.
  3. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_host_vms_cores_total

Output Before Changes:

cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="used",dedicated="0"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="total",dedicated="0"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="used",dedicated="0"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="total",dedicated="0"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",filter="allocated"} 4
cloudstack_host_vms_cores_total_by_tag\{zone="mgt122-60",filter="allocated",tags="tage1"} 0

Output After Changes:

cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="used",dedicated="0",tags="tage1"} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node75",ip="10.135.122.75",filter="total",dedicated="0",tags="tage1"} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="used",dedicated="0",tags=""} 2
cloudstack_host_vms_cores_total\{zone="mgt122-60",hostname="node74",ip="10.135.122.74",filter="total",dedicated="0",tags=""} 4
cloudstack_host_vms_cores_total\{zone="mgt122-60",filter="allocated"} 4
cloudstack_host_vms_cores_total_by_tag\{zone="mgt122-60",filter="allocated",tags="tage1"} 0

5. Show count of Active domains on grafana de08479da13b7b3f3eb467fc3798c6734f0e6fb7

============== Scenario One ==============

  1. Enable Prometheus.
  2. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_active_domains_total. Output is:
cloudstack_active_domains_total{zone="mgt122-60"} 1
  1. Create a new domain
  2. Repeat step two. The output will not change.
  3. Add a new account to the domain created in step three.
  4. Repeat step two. The output will change to:
cloudstack_active_domains_total{zone="mgt122-60"} 2

============== Scenario Two ==============

  1. Use previous environment
  2. Disable all account in domain created in step 3 of Scenario one.
  3. Repeat step two of Scenario one. The output will change to:
cloudstack_active_domains_total{zone="mgt122-60"} 1

6. Show count of Active accounts and vms by size on grafana d7aa19f0f850dfd5eea5c4f51a6529d39c2daf88

============== Scenario One ==============

  1. Enable Prometheus.
  2. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_active_accounts_total. output is:
cloudstack_active_accounts_total{zone="mgt122-60"} 1
  1. Create a new account
  2. Repeat step two. The output will change to:
cloudstack_active_accounts_total\{zone="mgt122-60"} 2

============== Scenario Two ==============

  1. Enable Prometheus.
  2. Run curl http://127.0.0.1:9595/metrics | grep cloudstack_vms_total_by_size. output is:
cloudstack_vms_total_by_size\{zone="mgt122-60",cpu="1",memory="512"} 2
  1. Add new instance with different offering
  2. Repeat step two. The output will change to:
cloudstack_vms_total_by_size{zone="mgt122-60",cpu="1",memory="512"} 2
cloudstack_vms_total_by_size\{zone="mgt122-60",cpu="1",memory="1024"} 1

soreana avatar Oct 30 '20 10:10 soreana

Hi @soreana is this PR ready for review? @blueorangutan package

nvazquez avatar Jul 01 '21 05:07 nvazquez

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan avatar Jul 01 '21 05:07 blueorangutan

Packaging result: :heavy_multiplication_x: centos7 :heavy_check_mark: centos8 :heavy_multiplication_x: debian. SL-JID 444

blueorangutan avatar Jul 01 '21 06:07 blueorangutan

Hey @nvazquez Yes, it is ready for review.

soreana avatar Jul 01 '21 08:07 soreana

@blueorangutan package

nvazquez avatar Jul 01 '21 15:07 nvazquez

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan avatar Jul 01 '21 15:07 blueorangutan

Packaging result: :heavy_check_mark: centos7 :heavy_check_mark: centos8 :heavy_check_mark: debian. SL-JID 452

blueorangutan avatar Jul 01 '21 16:07 blueorangutan

@blueorangutan test

nvazquez avatar Jul 01 '21 16:07 nvazquez

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Jul 01 '21 16:07 blueorangutan

Trillian test result (tid-1191) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 38732 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4438-t1191-kvm-centos7.zip Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py Smoke tests completed. 88 look OK, 0 have error(s) Only failed tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Jul 02 '21 03:07 blueorangutan

@blueorangutan package

nvazquez avatar Aug 09 '21 17:08 nvazquez

@nvazquez a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan avatar Aug 09 '21 17:08 blueorangutan

Packaging result: :heavy_check_mark: el7 :heavy_check_mark: el8 :heavy_check_mark: debian. SL-JID 814

blueorangutan avatar Aug 09 '21 18:08 blueorangutan

@blueorangutan test

nvazquez avatar Aug 09 '21 19:08 nvazquez

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Aug 09 '21 19:08 blueorangutan

@blueorangutan package

rohityadavcloud avatar Sep 08 '21 05:09 rohityadavcloud

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan avatar Sep 08 '21 05:09 blueorangutan

Packaging result: :heavy_multiplication_x: el7 :heavy_check_mark: el8 :heavy_multiplication_x: debian :heavy_check_mark: suse15. SL-JID 1165

blueorangutan avatar Sep 08 '21 06:09 blueorangutan

@blueorangutan package

rohityadavcloud avatar Sep 14 '21 06:09 rohityadavcloud

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan avatar Sep 14 '21 06:09 blueorangutan

Packaging result: :heavy_check_mark: el7 :heavy_check_mark: el8 :heavy_check_mark: debian :heavy_check_mark: suse15. SL-JID 1237

blueorangutan avatar Sep 14 '21 10:09 blueorangutan

@blueorangutan test

sureshanaparti avatar Sep 15 '21 06:09 sureshanaparti

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Sep 15 '21 06:09 blueorangutan

@blueorangutan test

nvazquez avatar Sep 16 '21 03:09 nvazquez

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Sep 16 '21 03:09 blueorangutan

Trillian test result (tid-2077) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 41441 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4438-t2077-kvm-centos7.zip Smoke tests completed. 85 look OK, 4 have errors Only failed tests results shown below:

Test Result Time (s) Test File
test_01_add_primary_storage_disabled_host Error 1.22 test_primary_storage.py
test_01_primary_storage_nfs Error 0.13 test_primary_storage.py
ContextSuite context=TestStorageTags>:setup Error 0.23 test_primary_storage.py
test_02_list_snapshots_with_removed_data_store Error 1.31 test_snapshots.py
test_01_secure_vm_migration Error 164.57 test_vm_life_cycle.py
test_02_unsecure_vm_migration Error 276.33 test_vm_life_cycle.py
test_03_secured_to_nonsecured_vm_migration Error 148.04 test_vm_life_cycle.py
test_08_migrate_vm Error 44.80 test_vm_life_cycle.py
test_hostha_enable_ha_when_host_in_maintenance Error 307.19 test_hostha_kvm.py

blueorangutan avatar Sep 16 '21 15:09 blueorangutan

@blueorangutan test

nvazquez avatar Sep 20 '21 19:09 nvazquez

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Sep 20 '21 19:09 blueorangutan

Trillian test result (tid-2135) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 50191 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4438-t2135-kvm-centos7.zip Smoke tests completed. 90 look OK, 3 have errors Only failed tests results shown below:

Test Result Time (s) Test File
test_deploy_vm_start_failure Error 61.27 test_deploy_vm.py
test_deploy_vm_volume_creation_failure Error 61.36 test_deploy_vm.py
test_vm_ha Error 59.33 test_vm_ha.py
test_vm_sync Error 129.03 test_vm_sync.py

blueorangutan avatar Sep 21 '21 09:09 blueorangutan

@blueorangutan package

nvazquez avatar Sep 21 '21 11:09 nvazquez