operator-controller
operator-controller copied to clipboard
:seedling: add benchmark pipeline
Description
See more: https://github.com/operator-framework/operator-controller/issues/920
- Serial run on AWS, Cluster version is 4.18.0-0.nightly-2025-01-25-163410
jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10 1 1444716583 ns/op
BenchmarkCreateClusterCatalog-10 2 619736229 ns/op
BenchmarkCreateClusterCatalog-10 2 594375416 ns/op
BenchmarkCreateClusterCatalog-10 2 599388104 ns/op
BenchmarkCreateClusterCatalog-10 2 578713375 ns/op
BenchmarkCreateClusterCatalog-10 2 604820354 ns/op
BenchmarkCreateClusterCatalog-10 2 614665062 ns/op
BenchmarkCreateClusterCatalog-10 2 613025938 ns/op
BenchmarkCreateClusterCatalog-10 2 622365104 ns/op
BenchmarkCreateClusterCatalog-10 2 591780896 ns/op
PASS
ok github.com/operator-framework/operator-controller/test/e2e 18.360s
jiazha-mac:e2e jiazha$ go tool pprof mem.out
File: e2e.test
Type: alloc_space
Time: Jan 27, 2025 at 2:00pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 6893.09kB, 72.92% of 9453.29kB total
Showing top 10 nodes out of 93
flat flat% sum% cum cum%
1762.94kB 18.65% 18.65% 1762.94kB 18.65% runtime/pprof.StartCPUProfile
902.59kB 9.55% 28.20% 1485.59kB 15.72% compress/flate.NewWriter
583.01kB 6.17% 34.36% 583.01kB 6.17% compress/flate.newDeflateFast (inline)
548.84kB 5.81% 40.17% 1573.29kB 16.64% k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName
532.26kB 5.63% 45.80% 532.26kB 5.63% github.com/gogo/protobuf/proto.RegisterType
513.50kB 5.43% 51.23% 513.50kB 5.43% k8s.io/apimachinery/pkg/conversion.ConversionFuncs.AddUntyped
512.75kB 5.42% 56.66% 512.75kB 5.42% vendor/golang.org/x/crypto/cryptobyte.(*Builder).add
512.62kB 5.42% 62.08% 512.62kB 5.42% k8s.io/api/apps/v1beta2.addKnownTypes
512.44kB 5.42% 67.50% 512.44kB 5.42% sync.(*Map).dirtyLocked
512.14kB 5.42% 72.92% 512.14kB 5.42% k8s.io/api/resource/v1alpha3.init
(pprof) exit
jiazha-mac:e2e jiazha$ go tool pprof cpu.out
File: e2e.test
Type: cpu
Time: Jan 27, 2025 at 2:00pm (CST)
Duration: 17.83s, Total samples = 160ms ( 0.9%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 160ms, 100% of 160ms total
Showing top 10 nodes out of 93
flat flat% sum% cum cum%
40ms 25.00% 25.00% 40ms 25.00% runtime.pthread_cond_signal
40ms 25.00% 50.00% 40ms 25.00% runtime.scanobject
40ms 25.00% 75.00% 40ms 25.00% syscall.syscall
10ms 6.25% 81.25% 10ms 6.25% crypto/internal/edwards25519/field.addMul64 (inline)
10ms 6.25% 87.50% 10ms 6.25% k8s.io/apimachinery/pkg/runtime.(*clientNegotiator).Decoder
10ms 6.25% 93.75% 10ms 6.25% runtime.pthread_kill
10ms 6.25% 100% 10ms 6.25% runtime.usleep
0 0% 100% 30ms 18.75% bufio.(*Writer).Flush
0 0% 100% 10ms 6.25% crypto/ecdh.(*PrivateKey).PublicKey
0 0% 100% 10ms 6.25% crypto/ecdh.(*PrivateKey).PublicKey.func1
(pprof) exit
- parallel run on AWS, Cluster version is 4.18.0-0.nightly-2025-01-25-163410
jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10 1 1559796167 ns/op
BenchmarkCreateClusterCatalog-10 12 105038868 ns/op
BenchmarkCreateClusterCatalog-10 13 88542141 ns/op
BenchmarkCreateClusterCatalog-10 13 94152035 ns/op
BenchmarkCreateClusterCatalog-10 13 93457205 ns/op
BenchmarkCreateClusterCatalog-10 13 94673955 ns/op
BenchmarkCreateClusterCatalog-10 20 62803019 ns/op
BenchmarkCreateClusterCatalog-10 13 87578115 ns/op
BenchmarkCreateClusterCatalog-10 12 107728125 ns/op
BenchmarkCreateClusterCatalog-10 12 98580924 ns/op
PASS
ok github.com/operator-framework/operator-controller/test/e2e 38.984s
jiazha-mac:e2e jiazha$ go tool pprof cpu.out
File: e2e.test
Type: cpu
Time: Jan 27, 2025 at 2:09pm (CST)
Duration: 38.06s, Total samples = 570ms ( 1.50%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 410ms, 71.93% of 570ms total
Showing top 10 nodes out of 201
flat flat% sum% cum cum%
70ms 12.28% 12.28% 70ms 12.28% runtime.kevent
70ms 12.28% 24.56% 70ms 12.28% runtime.pthread_cond_signal
60ms 10.53% 35.09% 60ms 10.53% runtime.pthread_cond_wait
60ms 10.53% 45.61% 60ms 10.53% syscall.syscall
50ms 8.77% 54.39% 50ms 8.77% runtime.pthread_kill
30ms 5.26% 59.65% 30ms 5.26% runtime.madvise
30ms 5.26% 64.91% 30ms 5.26% runtime.pthread_cond_timedwait_relative_np
20ms 3.51% 68.42% 20ms 3.51% runtime.(*mspan).writeHeapBitsSmall
10ms 1.75% 70.18% 10ms 1.75% crypto/internal/bigmod.(*Nat).reset
10ms 1.75% 71.93% 10ms 1.75% k8s.io/apimachinery/pkg/runtime.setTargetKind
(pprof) exit
jiazha-mac:e2e jiazha$ go tool pprof mem.out
File: e2e.test
Type: alloc_space
Time: Jan 27, 2025 at 2:09pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 10718.54kB, 61.67% of 17380.55kB total
Showing top 10 nodes out of 166
flat flat% sum% cum cum%
2048.12kB 11.78% 11.78% 2048.12kB 11.78% path.Join
1762.94kB 10.14% 21.93% 1762.94kB 10.14% runtime/pprof.StartCPUProfile
1536.56kB 8.84% 30.77% 1536.56kB 8.84% golang.org/x/net/http2.(*ClientConn).roundTrip
1065.48kB 6.13% 36.90% 2101.58kB 12.09% k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName
1025.38kB 5.90% 42.80% 1025.38kB 5.90% sync.(*Pool).pinSlow
1024.14kB 5.89% 48.69% 1536.17kB 8.84% k8s.io/client-go/rest.(*Request).URL
650.62kB 3.74% 52.43% 650.62kB 3.74% compress/flate.(*compressor).init
553.04kB 3.18% 55.62% 553.04kB 3.18% github.com/gogo/protobuf/proto.RegisterType
528.17kB 3.04% 58.65% 528.17kB 3.04% regexp.(*bitState).reset
524.09kB 3.02% 61.67% 524.09kB 3.02% k8s.io/apimachinery/pkg/conversion.ConversionFuncs.AddUntyped
(pprof) exit
Reviewer Checklist
- [ ] API Go Documentation
- [x] Tests: Unit Tests (and E2E Tests, if appropriate)
- [ ] Comprehensive Commit Messages
- [x] Links to related GitHub Issue(s)
Deploy Preview for olmv1 ready!
| Name | Link |
|---|---|
| Latest commit | 17c9d519bc7616c3ccf5c34b1967f7dba6c551ae |
| Latest deploy log | https://app.netlify.com/sites/olmv1/deploys/67adb658357e960008fa1987 |
| Deploy Preview | https://deploy-preview-1651--olmv1.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 67.45%. Comparing base (
becde51) to head (17c9d51). Report is 143 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #1651 +/- ##
=======================================
Coverage 67.45% 67.45%
=======================================
Files 61 61
Lines 5245 5245
=======================================
Hits 3538 3538
Misses 1446 1446
Partials 261 261
| Flag | Coverage Ξ | |
|---|---|---|
| e2e | 52.07% <ΓΈ> (-0.08%) |
:arrow_down: |
| unit | 55.00% <ΓΈ> (ΓΈ) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- parallel run on IBMCloud, Cluster version is 4.18.0-0.nightly-2025-01-25-163410
jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10 1 2093042334 ns/op
BenchmarkCreateClusterCatalog-10 4 611432146 ns/op
BenchmarkCreateClusterCatalog-10 10 224809304 ns/op
BenchmarkCreateClusterCatalog-10 13 92630189 ns/op
BenchmarkCreateClusterCatalog-10 6 174559444 ns/op
BenchmarkCreateClusterCatalog-10 12 107088038 ns/op
BenchmarkCreateClusterCatalog-10 1 1003581583 ns/op
BenchmarkCreateClusterCatalog-10 3 379469264 ns/op
BenchmarkCreateClusterCatalog-10 2 606936271 ns/op
BenchmarkCreateClusterCatalog-10 1 2773485917 ns/op
PASS
ok github.com/operator-framework/operator-controller/test/e2e 34.115s
jiazha-mac:e2e jiazha$ go tool pprof mem.out
File: e2e.test
Type: alloc_space
Time: Jan 27, 2025 at 5:22pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 8433.33kB, 62.21% of 13555.35kB total
Showing top 10 nodes out of 126
flat flat% sum% cum cum%
1536.16kB 11.33% 11.33% 1536.16kB 11.33% golang.org/x/net/http2.(*ClientConn).roundTrip
1536.09kB 11.33% 22.66% 1536.09kB 11.33% path.Join
1184.27kB 8.74% 31.40% 1184.27kB 8.74% runtime/pprof.StartCPUProfile
902.59kB 6.66% 38.06% 1553.21kB 11.46% compress/flate.NewWriter
650.62kB 4.80% 42.86% 650.62kB 4.80% compress/flate.(*compressor).init
553.04kB 4.08% 46.94% 553.04kB 4.08% github.com/gogo/protobuf/proto.RegisterType
528.17kB 3.90% 50.84% 528.17kB 3.90% regexp.(*bitState).reset
516.01kB 3.81% 54.64% 516.01kB 3.81% google.golang.org/protobuf/internal/filedesc.(*File).initDecls
513.69kB 3.79% 58.43% 513.69kB 3.79% regexp.mergeRuneSets.func2
512.69kB 3.78% 62.21% 512.69kB 3.78% regexp/syntax.(*compiler).inst
(pprof) exit
jiazha-mac:e2e jiazha$ go tool pprof cpu.out
File: e2e.test
Type: cpu
Time: Jan 27, 2025 at 5:22pm (CST)
Duration: 33.19s, Total samples = 380ms ( 1.14%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 310ms, 81.58% of 380ms total
Showing top 10 nodes out of 154
flat flat% sum% cum cum%
90ms 23.68% 23.68% 90ms 23.68% runtime.kevent
50ms 13.16% 36.84% 50ms 13.16% runtime.pthread_cond_signal
50ms 13.16% 50.00% 50ms 13.16% syscall.syscall
40ms 10.53% 60.53% 40ms 10.53% runtime.pthread_cond_wait
30ms 7.89% 68.42% 60ms 15.79% runtime.scanobject
10ms 2.63% 71.05% 20ms 5.26% k8s.io/client-go/rest.(*Request).tryThrottleWithInfo
10ms 2.63% 73.68% 10ms 2.63% runtime.(*itabTableType).find
10ms 2.63% 76.32% 10ms 2.63% runtime.(*mheap).allocSpan
10ms 2.63% 78.95% 10ms 2.63% runtime.(*mspan).heapBitsSmallForAddr
10ms 2.63% 81.58% 10ms 2.63% runtime.(*unwinder).resolveInternal
(pprof) exit
Test on 4.19.0-0.nightly-2025-02-04-230011, GCP.
jiazha-mac:e2e jiazha$ export CATALOG_IMG=registry.redhat.io/redhat/redhat-operator-index:v4.18
jiazha-mac:e2e jiazha$ go test -run=^$ -bench=. -count=10 -memprofile=mem.out -cpuprofile=cpu.out
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
BenchmarkCreateClusterCatalog-10 1 1274109792 ns/op
BenchmarkCreateClusterCatalog-10 18 85367590 ns/op
BenchmarkCreateClusterCatalog-10 21 98587806 ns/op
BenchmarkCreateClusterCatalog-10 20 80611660 ns/op
BenchmarkCreateClusterCatalog-10 15 95789072 ns/op
BenchmarkCreateClusterCatalog-10 15 87705625 ns/op
BenchmarkCreateClusterCatalog-10 14 186946500 ns/op
BenchmarkCreateClusterCatalog-10 12 166680833 ns/op
BenchmarkCreateClusterCatalog-10 9 177902880 ns/op
BenchmarkCreateClusterCatalog-10 4 447856864 ns/op
PASS
ok github.com/operator-framework/operator-controller/test/e2e 38.205s
I really think we need to get a Brief and/or RFC written up to get agreement and consensus on the approaches before we merge stuff. But I also think the kind of prototyping and brainstorming that you're doing here and that @OchiengEd did will be necessary to get to the point that we are confident in our approach.
I really think we need to get a Brief and/or RFC written up to get agreement and consensus on the approaches before we merge stuff.
A Brief is drafting here: [WIP]Brief: Benchmarking test OLMv1
Benchmark test pass: https://github.com/operator-framework/operator-controller/actions/runs/13214623057/job/36892420269?pr=1651 download-artifact:
jiazha-mac:~ jiazha$ tree Downloads/benchmark-artifacts/
Downloads/benchmark-artifacts/
βββ new.txt
βββ output
1 directory, 2 files
jiazha-mac:~ jiazha$ cat Downloads/benchmark-artifacts/new.txt
goos: linux
goarch: amd64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: AMD EPYC 7763 64-Core Processor
BenchmarkCreateClusterCatalog
BenchmarkCreateClusterCatalog-4 81 82695425 ns/op 36570 B/op 397 allocs/op
BenchmarkCreateClusterCatalog-4 12 99872913 ns/op 37266 B/op 404 allocs/op
BenchmarkCreateClusterCatalog-4 12 99852972 ns/op 37327 B/op 402 allocs/op
BenchmarkCreateClusterCatalog-4 12 99882173 ns/op 37409 B/op 405 allocs/op
BenchmarkCreateClusterCatalog-4 12 100024048 ns/op 37350 B/op 405 allocs/op
BenchmarkCreateClusterCatalog-4 12 100098746 ns/op 37568 B/op 406 allocs/op
BenchmarkCreateClusterCatalog-4 12 100037742 ns/op 38134 B/op 405 allocs/op
BenchmarkCreateClusterCatalog-4 12 99984867 ns/op 37121 B/op 403 allocs/op
BenchmarkCreateClusterCatalog-4 12 99855796 ns/op 38886 B/op 406 allocs/op
BenchmarkCreateClusterCatalog-4 12 99946190 ns/op 38851 B/op 404 allocs/op
PASS
ok github.com/operator-framework/operator-controller/test/e2e 20.427s
jiazha-mac:~ jiazha$ cat Downloads/benchmark-artifacts/output
goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-controller/test/e2e
cpu: Apple M1 Pro
β benchmarks/baseline.txt β
β sec/op β
CreateClusterCatalog-10 85.11m Β± 16%
β benchmarks/baseline.txt β
β B/op β
CreateClusterCatalog-10 36.21Ki Β± 6%
β benchmarks/baseline.txt β
β allocs/op β
CreateClusterCatalog-10 394.5 Β± 1%
goos: linux
goarch: amd64
cpu: AMD EPYC 7763 64-Core Processor
β /tmp/artifacts/new.txt β
β sec/op β
CreateClusterCatalog-4 99.91m Β± 0%
β /tmp/artifacts/new.txt β
β B/op β
CreateClusterCatalog-4 36.50Ki Β± 4%
β /tmp/artifacts/new.txt β
β allocs/op β
CreateClusterCatalog-4 404.5 Β± 1%
This benchmark pipeline logic as follows:
- run-benchmark, here is the running log: https://github.com/operator-framework/operator-controller/actions/runs/13304141348/job/37151204944?pr=1651
- Run benchmark test cases by using the
go test -v -run=^$$ -bench=. -benchmem -count=10 -v ./test/e2e/... - Convert the test results to Prometheus metrics
- Upload this metrics file by using action artifacts
- run-prometheus, here is the running log: https://github.com/operator-framework/operator-controller/actions/runs/13304141348/job/37151497828?pr=1651
- Download the previous job's Prometheus DB across jobs.(ToDo: make it across repos by using the GitHub REST API)
- Extract and Restore Prometheus Snapshot
- Set Up Prometheus Config to listen to
$HOST_IP:9000.
cat << EOF > prometheus.yml
global:
scrape_interval: 5s
scrape_configs:
- job_name: 'benchmark_metrics'
static_configs:
- targets: ['$HOST_IP:9000']
EOF
- Run Prometheus in container
docker run -d --name prometheus -p 9090:9090 \
--user=root \
-v ${{ github.workspace }}/prometheus.yml:/etc/prometheus/prometheus.yml \
-v ${{ github.workspace }}/prometheus-data:/prometheus \
prom/prometheus --config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/prometheus \
--storage.tsdb.retention.time=1h \
--web.enable-admin-api
- Start HTTP Server to Expose Metrics (Prometheus grabs the metrics and store them to its
tsdb). - Check Benchmark Metrics Against Threshold.
For the
Threshold, we can update them to the appropriate value after running days. For query metrics, we can add more with the benchmark test cases increasing.
MAX_TIME_NS=1200000000 # 1.2s
MAX_ALLOCS=4000
MAX_MEM_BYTES=450000
# Query Prometheus Metrics, get the max value
time_ns=$(curl -s "http://localhost:9090/api/v1/query?query=max(benchmark_createclustercatalog_ns)" | jq -r '.data.result[0].value[1]')
allocs=$(curl -s "http://localhost:9090/api/v1/query?query=max(benchmark_createclustercatalog_allocs)" | jq -r '.data.result[0].value[1]')
mem_bytes=$(curl -s "http://localhost:9090/api/v1/query?query=max(benchmark_createclustercatalog_mem_bytes)" | jq -r '.data.result[0].value[1]')
- Find and Upload Prometheus Snapshot
- Stop Prometheus
- Upload Prometheus Snapshot
- Done
Hi @joelanford , I have implemented the logic you suggested above, could you help have a review when you get a chance? Thanks!
I download the snapshot from https://github.com/operator-framework/operator-controller/actions/runs/13304141348 and check them in Prometheus. It works as expected. As follows,
jiazha-mac:prometheus-3.1.0.darwin-arm64 jiazha$ tree data
data
βββ 01JHSRNJT32YH9858HS49WKGNS
βΒ Β βββ chunks
βΒ Β βΒ Β βββ 000001
βΒ Β βββ index
βΒ Β βββ meta.json
βΒ Β βββ tombstones
βββ 01JKZ7W6XKK9WQ4B9JM5GCNZMW
βΒ Β βββ chunks
βΒ Β βΒ Β βββ 000001
βΒ Β βββ index
βΒ Β βββ meta.json
βΒ Β βββ tmp_dbro_sandbox3927239696
βΒ Β βββ tombstones
βββ 01JKZ9ZF06MSA9VCPFBRSRWB4J
βΒ Β βββ chunks
βΒ Β βΒ Β βββ 000001
βΒ Β βββ index
βΒ Β βββ meta.json
βΒ Β βββ tombstones
βββ chunks_head
βββ prometheus.tar
βββ queries.active
βββ wal
βββ 00000000
βββ 00000001
βββ 00000002
10 directories, 17 files
jiazha-mac:prometheus-3.1.0.darwin-arm64 jiazha$ ./prometheus
time=2025-02-13T09:38:09.214Z level=INFO source=main.go:636 msg="No time or size retention was set so using the default time retention" duration=15d
time=2025-02-13T09:38:09.214Z level=INFO source=main.go:683 msg="Starting Prometheus Server" mode=server version="(version=3.1.0, branch=HEAD, revision=7086161a93b262aa0949dbf2aba15a5a7b13e0a3)"
...
http://localhost:9090/query?g0.expr=benchmark_createclustercatalog_allocs&g0.show_tree=0&g0.tab=graph&g0.range_input=1h&g0.res_type=auto&g0.res_density=medium&g0.display_mode=lines&g0.show_exemplars=0
PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
closing as stale- please re-open if its still relevant =D