origin icon indicating copy to clipboard operation
origin copied to clipboard

add system-reserved-compressible test

Open qiliRedHat opened this issue 3 weeks ago • 5 comments

https://issues.redhat.com/browse/OCPNODE-3939

dynamic-system-reserved-cpu.sh is used to calculate the SYSTEM_RESERVED_CPU for AutoSizingReserved Test

./dynamic-system-reserved-cpu.sh 96
SYSTEM_RESERVED_CPU=1.20

stress-slice-single-node.sh is to stress a given node's given slice with specified cpu cores for a period of time Test

./stress-slice-single-node.sh ip-xx-xx-xx-xx.us-east-2.compute.internal 600 system.slice:4,kubepods.slice:7                                    
[INFO] =========================================
[INFO] Systemd Slice CPU Stress Test (Host-level)
[INFO] =========================================
[INFO] Target Node: ip-xx-xx-xx-xx.us-east-2.compute.internal
[INFO] Duration: 600s
[INFO] Target Slices:
[INFO]   - system.slice: 4 cores
[INFO]   - kubepods.slice: 7 cores
[INFO] =========================================
[INFO] Gathering node information...
[INFO] Node CPU Info:
  Total Capacity: 4 cores
  Allocatable: 3500m cores
  System Reserved: 0.50 cores (500m)
[INFO] Starting stress test on node: ip-xx-xx-xx-xx.us-east-2.compute.internal
[WARN] Using 'oc debug node' to run systemd-run commands directly on the host
=========================================
Starting multi-slice stress test

Launching stress processes...

Starting 4 processes in system.slice...
  Started process 1 in system.slice
  Started process 2 in system.slice
  Started process 3 in system.slice
  Started process 4 in system.slice

Starting 7 processes in kubepods.slice...
  Started process 1 in kubepods.slice
  Started process 2 in kubepods.slice
  Started process 3 in kubepods.slice
  Started process 4 in kubepods.slice
  Started process 5 in kubepods.slice
  Started process 6 in kubepods.slice
  Started process 7 in kubepods.slice

Removing debug pod ...
Running as unit: stress-test-kubepods-slice-1.service
  stress-test-kubepods-slice-1.service loaded active running CPU Stress Test for kubepods.slice Process 1
  stress-test-kubepods-slice-2.service loaded active running CPU Stress Test for kubepods.slice Process 2
  stress-test-kubepods-slice-3.service loaded active running CPU Stress Test for kubepods.slice Process 3
  stress-test-kubepods-slice-4.service loaded active running CPU Stress Test for kubepods.slice Process 4
  stress-test-kubepods-slice-5.service loaded active running CPU Stress Test for kubepods.slice Process 5
  stress-test-kubepods-slice-6.service loaded active running CPU Stress Test for kubepods.slice Process 6
  stress-test-kubepods-slice-7.service loaded active running CPU Stress Test for kubepods.slice Process 7
  stress-test-system-slice-1.service   loaded active running CPU Stress Test for system.slice Process 1
  stress-test-system-slice-2.service   loaded active running CPU Stress Test for system.slice Process 2
  stress-test-system-slice-3.service   loaded active running CPU Stress Test for system.slice Process 3
  stress-test-system-slice-4.service   loaded active running CPU Stress Test for system.slice Process 4

CPU-intensive processes running. Will run for 600 seconds...

[INFO] 
[INFO] =========================================
[INFO] Stress test running for 600 seconds
[INFO] Monitor the output above for node behavior
[INFO] Stress processes are running in the following slices on the HOST:
[INFO]   - system.slice: 4 cores
[INFO]   - kubepods.slice: 7 cores
[INFO] =========================================
[INFO] 
[INFO] Starting background monitoring...
[INFO] Monitoring output will be saved to: stress_test_log_iip-xx-xx-xx-xx.us-east-2.compute.internal_20251209_140745.log
[Time remaining]: 600s

==================== Tue Dec  9 14:07:45 CST 2025 ====================

--- Node Resource Usage ---
NAME                                        CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)   
ip-xx-xx-xx-xx.us-east-2.compute.internal   3993m        114%     1948Mi          15%         

--- Pods on Node (Non-Running, excluding Completed) ---
All pods running or none found

--- Stress test processes are running ---
[Time remaining]: 590s

  stress-test-kubepods-slice-1.service loaded active running CPU Stress Test for kubepods.slice Process 1
  stress-test-kubepods-slice-2.service loaded active running CPU Stress Test for kubepods.slice Process 2
  stress-test-kubepods-slice-3.service loaded active running CPU Stress Test for kubepods.slice Process 3
  stress-test-kubepods-slice-4.service loaded active running CPU Stress Test for kubepods.slice Process 4
  stress-test-kubepods-slice-5.service loaded active running CPU Stress Test for kubepods.slice Process 5
  stress-test-kubepods-slice-6.service loaded active running CPU Stress Test for kubepods.slice Process 6
  stress-test-kubepods-slice-7.service loaded active running CPU Stress Test for kubepods.slice Process 7
  stress-test-system-slice-1.service   loaded active running CPU Stress Test for system.slice Process 1
  stress-test-system-slice-2.service   loaded active running CPU Stress Test for system.slice Process 2
  stress-test-system-slice-3.service   loaded active running CPU Stress Test for system.slice Process 3
  stress-test-system-slice-4.service   loaded active running CPU Stress Test for system.slice Process 4

[Time remaining]: 580s

==================== Tue Dec  9 14:08:09 CST 2025 ====================
....
=========================================
Stress test completed!
=========================================
[INFO] 
[INFO] =========================================
[INFO] Stress test completed!
[INFO] =========================================
[INFO] Checking final node status...
NAME                                        STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                KERNEL-VERSION                 CONTAINER-RUNTIME
ip-xx-xx-xx-xx.us-east-2.compute.internal  Ready    worker   4h45m   v1.34.2   10.0.64.194   <none>        Red Hat Enterprise Linux CoreOS 9.6.20251205-0 (Plow)   5.14.0-570.73.1.el9_6.x86_64   cri-o://1.34.2-2.rhaos4.21.gitc8e8b46.el9
[INFO] 
[INFO] Checking for any evicted or failed pods...
[INFO] No evicted/failed pods found
[INFO] 
[INFO] Recent events on node:
....
[INFO] 
[INFO] =========================================
[INFO] Test Summary
[INFO] =========================================
[INFO] Node: ip-xx-xx-xx-xx.us-east-2.compute.internal
[INFO] Stress Duration: 600s
[INFO] Slices stressed:
[INFO]   - system.slice: 4 cores
[INFO]   - kubepods.slice: 7 cores
[INFO] Total CPU cores stressed: 11
[INFO] Log file: stress_test_log_ip-xx-xx-xx-xx.us-east-2.compute.internal_*.log
[INFO] 
[INFO]   chroot /host journalctl -u stress-test-* --since '10 minutes ago'
[INFO] 
[INFO] Prometheus queries to check (example for first slice):
[INFO]   rate(container_cpu_usage_seconds_total{id="/system.slice", node="ip-xx-xx-xx-xx.us-east-2.compute.internal"}[1m])* 1000
[INFO] =========================================

qiliRedHat avatar Dec 09 '25 07:12 qiliRedHat

Pipeline controller notification This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

openshift-ci-robot avatar Dec 09 '25 07:12 openshift-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: qiliRedHat Once this PR has been reviewed and has the lgtm label, please assign stbenjam for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Dec 09 '25 07:12 openshift-ci[bot]

Scheduling required tests: /test e2e-aws-csi /test e2e-aws-ovn-fips /test e2e-aws-ovn-microshift /test e2e-aws-ovn-microshift-serial /test e2e-aws-ovn-serial-1of2 /test e2e-aws-ovn-serial-2of2 /test e2e-gcp-csi /test e2e-gcp-ovn /test e2e-gcp-ovn-upgrade /test e2e-metal-ipi-ovn-ipv6 /test e2e-vsphere-ovn /test e2e-vsphere-ovn-upi

openshift-ci-robot avatar Dec 09 '25 07:12 openshift-ci-robot

/test e2e-gcp-ovn

qiliRedHat avatar Dec 09 '25 12:12 qiliRedHat

@qiliRedHat: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Dec 09 '25 15:12 openshift-ci[bot]