metric
metric copied to clipboard
perf: eliminate excessive heap allocations
This PR reduces the amount of heap allocations made in the histogram.Add and histogram.trim functions.
In this library, we have a small constant for maxBins, so it's better to trade some compute over heap allocations.
I ran the following test:
go test -bench="BenchmarkMetrics/timeline/histogram" -benchmem -cpuprofile prof.cpu -memprofile prof.mem -blockprofile prof.block -benchtime 5s
Original (bench):
goos: linux
goarch: amd64
pkg: github.com/zserge/metric
cpu: AMD Ryzen 9 7900X 12-Core Processor
BenchmarkMetrics/timeline/histogram-12 6446358 932.0 ns/op 1702 B/op 1 allocs/op
PASS
ok github.com/zserge/metric 7.124s
Original (Heap):
go tool pprof -top prof.mem
File: metric.test
Build ID: 852f07a4a74371cd52c1b5b7abf96a86eaa46d4c
Type: alloc_space
Time: 2025-11-11 17:23:48 UTC
Showing nodes accounting for 11.77GB, 100% of 11.77GB total
Dropped 28 nodes (cum <= 0.06GB)
flat flat% sum% cum cum%
11.77GB 100% 100% 11.77GB 100% github.com/zserge/metric.(*histogram).Add
0 0% 100% 11.76GB 99.89% github.com/zserge/metric.(*timeseries).Add
0 0% 100% 11.76GB 99.89% github.com/zserge/metric.BenchmarkMetrics.func6
0 0% 100% 11.76GB 99.89% testing.(*B).launch
0 0% 100% 11.76GB 99.89% testing.(*B).runN
Improved (bench):
goos: linux
goarch: amd64
pkg: github.com/zserge/metric
cpu: AMD Ryzen 9 7900X 12-Core Processor
BenchmarkMetrics/timeline/histogram-12 12500581 488.9 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/zserge/metric 6.722s
Improved (heap):
go tool pprof -top prof.mem
File: metric.test
Build ID: 4163c2636b8c890969f4262c30deec95abf6f7e1
Type: alloc_space
Time: 2025-11-11 17:27:18 UTC
Showing nodes accounting for 6198.37kB, 100% of 6198.37kB total
flat flat% sum% cum cum%
1539kB 24.83% 24.83% 1539kB 24.83% runtime.allocm
1184.27kB 19.11% 43.94% 1184.27kB 19.11% runtime/pprof.StartCPUProfile
902.59kB 14.56% 58.50% 902.59kB 14.56% compress/flate.NewWriter (inline)
521.37kB 8.41% 66.91% 521.37kB 8.41% runtime/pprof.(*profileBuilder).emitLocation
513.50kB 8.28% 75.19% 513.50kB 8.28% runtime/pprof.(*protobuf).varint (inline)
512.88kB 8.27% 83.47% 512.88kB 8.27% sync.(*Pool).pinSlow
512.69kB 8.27% 91.74% 512.69kB 8.27% regexp/syntax.(*compiler).inst (inline)
512.08kB 8.26% 100% 512.08kB 8.26% compress/gzip.NewWriterLevel
0 0% 100% 902.59kB 14.56% compress/gzip.(*Writer).Write
0 0% 100% 512.88kB 8.27% fmt.Fprintf
0 0% 100% 512.88kB 8.27% fmt.newPrinter
0 0% 100% 512.88kB 8.27% github.com/zserge/metric.BenchmarkMetrics
0 0% 100% 1696.96kB 27.38% main.main
0 0% 100% 512.69kB 8.27% regexp.Compile (inline)
0 0% 100% 512.69kB 8.27% regexp.compile
0 0% 100% 512.69kB 8.27% regexp/syntax.Compile
0 0% 100% 1696.96kB 27.38% runtime.main
0 0% 100% 1026kB 16.55% runtime.mcall
0 0% 100% 513kB 8.28% runtime.mstart
0 0% 100% 513kB 8.28% runtime.mstart0
0 0% 100% 513kB 8.28% runtime.mstart1
0 0% 100% 1539kB 24.83% runtime.newm
0 0% 100% 1026kB 16.55% runtime.park_m
0 0% 100% 1539kB 24.83% runtime.resetspinning
0 0% 100% 1539kB 24.83% runtime.schedule
0 0% 100% 1539kB 24.83% runtime.startm
0 0% 100% 1539kB 24.83% runtime.wakep
0 0% 100% 521.37kB 8.41% runtime/pprof.(*profileBuilder).appendLocsForStack
0 0% 100% 1937.46kB 31.26% runtime/pprof.(*profileBuilder).build
0 0% 100% 902.59kB 14.56% runtime/pprof.(*profileBuilder).flush
0 0% 100% 1416.09kB 22.85% runtime/pprof.(*profileBuilder).pbSample
0 0% 100% 513.50kB 8.28% runtime/pprof.(*protobuf).int64 (inline)
0 0% 100% 513.50kB 8.28% runtime/pprof.(*protobuf).int64s
0 0% 100% 513.50kB 8.28% runtime/pprof.(*protobuf).uint64 (inline)
0 0% 100% 512.08kB 8.26% runtime/pprof.newProfileBuilder
0 0% 100% 2449.53kB 39.52% runtime/pprof.profileWriter
0 0% 100% 512.88kB 8.27% sync.(*Pool).Get
0 0% 100% 512.88kB 8.27% sync.(*Pool).pin
0 0% 100% 512.88kB 8.27% testing.(*B).Run
0 0% 100% 512.88kB 8.27% testing.(*B).run
0 0% 100% 512.88kB 8.27% testing.(*B).run1.func1
0 0% 100% 512.88kB 8.27% testing.(*B).runN
0 0% 100% 1696.96kB 27.38% testing.(*M).Run
0 0% 100% 1184.27kB 19.11% testing.(*M).before
0 0% 100% 512.88kB 8.27% testing.(*benchState).processBench
0 0% 100% 512.69kB 8.27% testing.(*matcher).fullName
0 0% 100% 512.88kB 8.27% testing.BenchmarkResult.String
0 0% 100% 512.69kB 8.27% testing.runBenchmarks
0 0% 100% 512.69kB 8.27% testing.simpleMatch.matches
0 0% 100% 512.69kB 8.27% testing/internal/testdeps.TestDeps.MatchString
0 0% 100% 1184.27kB 19.11% testing/internal/testdeps.TestDeps.StartCPUProfile