metric icon indicating copy to clipboard operation
metric copied to clipboard

perf: eliminate excessive heap allocations

Open encodeous opened this issue 1 month ago • 0 comments

This PR reduces the amount of heap allocations made in the histogram.Add and histogram.trim functions.

In this library, we have a small constant for maxBins, so it's better to trade some compute over heap allocations.

I ran the following test:

go test -bench="BenchmarkMetrics/timeline/histogram" -benchmem -cpuprofile prof.cpu -memprofile prof.mem -blockprofile prof.block -benchtime 5s

Original (bench):

goos: linux
goarch: amd64
pkg: github.com/zserge/metric
cpu: AMD Ryzen 9 7900X 12-Core Processor
BenchmarkMetrics/timeline/histogram-12           6446358               932.0 ns/op          1702 B/op          1 allocs/op
PASS
ok      github.com/zserge/metric        7.124s

Original (Heap):

go tool pprof -top prof.mem
File: metric.test
Build ID: 852f07a4a74371cd52c1b5b7abf96a86eaa46d4c
Type: alloc_space
Time: 2025-11-11 17:23:48 UTC
Showing nodes accounting for 11.77GB, 100% of 11.77GB total
Dropped 28 nodes (cum <= 0.06GB)
      flat  flat%   sum%        cum   cum%
   11.77GB   100%   100%    11.77GB   100%  github.com/zserge/metric.(*histogram).Add
         0     0%   100%    11.76GB 99.89%  github.com/zserge/metric.(*timeseries).Add
         0     0%   100%    11.76GB 99.89%  github.com/zserge/metric.BenchmarkMetrics.func6
         0     0%   100%    11.76GB 99.89%  testing.(*B).launch
         0     0%   100%    11.76GB 99.89%  testing.(*B).runN

Improved (bench):

goos: linux
goarch: amd64
pkg: github.com/zserge/metric
cpu: AMD Ryzen 9 7900X 12-Core Processor            
BenchmarkMetrics/timeline/histogram-12          12500581               488.9 ns/op             0 B/op          0 allocs/op
PASS
ok      github.com/zserge/metric        6.722s

Improved (heap):

 go tool pprof -top prof.mem
File: metric.test
Build ID: 4163c2636b8c890969f4262c30deec95abf6f7e1
Type: alloc_space
Time: 2025-11-11 17:27:18 UTC
Showing nodes accounting for 6198.37kB, 100% of 6198.37kB total
      flat  flat%   sum%        cum   cum%
    1539kB 24.83% 24.83%     1539kB 24.83%  runtime.allocm
 1184.27kB 19.11% 43.94%  1184.27kB 19.11%  runtime/pprof.StartCPUProfile
  902.59kB 14.56% 58.50%   902.59kB 14.56%  compress/flate.NewWriter (inline)
  521.37kB  8.41% 66.91%   521.37kB  8.41%  runtime/pprof.(*profileBuilder).emitLocation
  513.50kB  8.28% 75.19%   513.50kB  8.28%  runtime/pprof.(*protobuf).varint (inline)
  512.88kB  8.27% 83.47%   512.88kB  8.27%  sync.(*Pool).pinSlow
  512.69kB  8.27% 91.74%   512.69kB  8.27%  regexp/syntax.(*compiler).inst (inline)
  512.08kB  8.26%   100%   512.08kB  8.26%  compress/gzip.NewWriterLevel
         0     0%   100%   902.59kB 14.56%  compress/gzip.(*Writer).Write
         0     0%   100%   512.88kB  8.27%  fmt.Fprintf
         0     0%   100%   512.88kB  8.27%  fmt.newPrinter
         0     0%   100%   512.88kB  8.27%  github.com/zserge/metric.BenchmarkMetrics
         0     0%   100%  1696.96kB 27.38%  main.main
         0     0%   100%   512.69kB  8.27%  regexp.Compile (inline)
         0     0%   100%   512.69kB  8.27%  regexp.compile
         0     0%   100%   512.69kB  8.27%  regexp/syntax.Compile
         0     0%   100%  1696.96kB 27.38%  runtime.main
         0     0%   100%     1026kB 16.55%  runtime.mcall
         0     0%   100%      513kB  8.28%  runtime.mstart
         0     0%   100%      513kB  8.28%  runtime.mstart0
         0     0%   100%      513kB  8.28%  runtime.mstart1
         0     0%   100%     1539kB 24.83%  runtime.newm
         0     0%   100%     1026kB 16.55%  runtime.park_m
         0     0%   100%     1539kB 24.83%  runtime.resetspinning
         0     0%   100%     1539kB 24.83%  runtime.schedule
         0     0%   100%     1539kB 24.83%  runtime.startm
         0     0%   100%     1539kB 24.83%  runtime.wakep
         0     0%   100%   521.37kB  8.41%  runtime/pprof.(*profileBuilder).appendLocsForStack
         0     0%   100%  1937.46kB 31.26%  runtime/pprof.(*profileBuilder).build
         0     0%   100%   902.59kB 14.56%  runtime/pprof.(*profileBuilder).flush
         0     0%   100%  1416.09kB 22.85%  runtime/pprof.(*profileBuilder).pbSample
         0     0%   100%   513.50kB  8.28%  runtime/pprof.(*protobuf).int64 (inline)
         0     0%   100%   513.50kB  8.28%  runtime/pprof.(*protobuf).int64s
         0     0%   100%   513.50kB  8.28%  runtime/pprof.(*protobuf).uint64 (inline)
         0     0%   100%   512.08kB  8.26%  runtime/pprof.newProfileBuilder
         0     0%   100%  2449.53kB 39.52%  runtime/pprof.profileWriter
         0     0%   100%   512.88kB  8.27%  sync.(*Pool).Get
         0     0%   100%   512.88kB  8.27%  sync.(*Pool).pin
         0     0%   100%   512.88kB  8.27%  testing.(*B).Run
         0     0%   100%   512.88kB  8.27%  testing.(*B).run
         0     0%   100%   512.88kB  8.27%  testing.(*B).run1.func1
         0     0%   100%   512.88kB  8.27%  testing.(*B).runN
         0     0%   100%  1696.96kB 27.38%  testing.(*M).Run
         0     0%   100%  1184.27kB 19.11%  testing.(*M).before
         0     0%   100%   512.88kB  8.27%  testing.(*benchState).processBench
         0     0%   100%   512.69kB  8.27%  testing.(*matcher).fullName
         0     0%   100%   512.88kB  8.27%  testing.BenchmarkResult.String
         0     0%   100%   512.69kB  8.27%  testing.runBenchmarks
         0     0%   100%   512.69kB  8.27%  testing.simpleMatch.matches
         0     0%   100%   512.69kB  8.27%  testing/internal/testdeps.TestDeps.MatchString
         0     0%   100%  1184.27kB 19.11%  testing/internal/testdeps.TestDeps.StartCPUProfile

encodeous avatar Nov 11 '25 17:11 encodeous