Performance regression and memory leak in SampleBuilder@v4
Your environment.
- Version:
v4.0.0-beta.14and later
What did you do?
Updated pion/webrtc from v4.0.0-beta.13 to later version.
What did you expect?
SampleBuilder CPU/memory usage doesn't significantly increased.
What happened?
CPU usage is doubled in the existing benchmark, and input RTP packets aren't unreferenced after the corresponding samples are popped.
CPU
| Bench name | v4.0.0-beta.13 CPU | v4.0.0-beta.14 CPU | CPU usage increase |
|---|---|---|---|
| BenchmarkSampleBuilderSequential-32 | 311.4 ns/op | 609.6 ns/op | x1.96 |
| BenchmarkSampleBuilderLoss-32 | 288.7 ns/op | 720.5 ns/op | x2.50 |
| BenchmarkSampleBuilderReordered-32 | 320.9 ns/op | 621.9 ns/op | x1.93 |
| BenchmarkSampleBuilderFragmented-32 | 272.8 ns/op | 540.5 ns/op | x1.98 |
| BenchmarkSampleBuilderFragmentedLoss-32 | 246.8 ns/op | 668.2 ns/op | x2.71 |
Full benchmark outputs
v4.0.0-beta.13
$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor
BenchmarkSampleBuilderSequential-32 3831741 311.4 ns/op 388 B/op 5 allocs/op
BenchmarkSampleBuilderLoss-32 3969078 288.7 ns/op 344 B/op 4 allocs/op
BenchmarkSampleBuilderReordered-32 3697982 320.9 ns/op 388 B/op 4 allocs/op
BenchmarkSampleBuilderFragmented-32 4363172 272.8 ns/op 362 B/op 4 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32 4807190 246.8 ns/op 317 B/op 3 allocs/op
PASS
ok github.com/pion/webrtc/v4/pkg/media/samplebuilder 7.556s
v4.0.0-beta.14
$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor
BenchmarkSampleBuilderSequential-32 1979661 609.6 ns/op 420 B/op 6 allocs/op
BenchmarkSampleBuilderLoss-32 1647154 720.5 ns/op 373 B/op 5 allocs/op
BenchmarkSampleBuilderReordered-32 1881794 621.9 ns/op 420 B/op 6 allocs/op
BenchmarkSampleBuilderFragmented-32 2221912 540.5 ns/op 394 B/op 5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32 1816394 668.2 ns/op 346 B/op 4 allocs/op
PASS
ok github.com/pion/webrtc/v4/pkg/media/samplebuilder 9.402s
v4.0.0-beta.19
$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor
BenchmarkSampleBuilderSequential-32 1978306 602.4 ns/op 420 B/op 6 allocs/op
BenchmarkSampleBuilderLoss-32 1609996 740.5 ns/op 373 B/op 5 allocs/op
BenchmarkSampleBuilderReordered-32 1897911 628.1 ns/op 420 B/op 6 allocs/op
BenchmarkSampleBuilderFragmented-32 2179366 538.9 ns/op 394 B/op 5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32 1744249 684.3 ns/op 346 B/op 4 allocs/op
PASS
ok github.com/pion/webrtc/v4/pkg/media/samplebuilder 9.426s
Memory
I created a test to check that the input RTP packets are unreferenced after all samples are popped.
- https://github.com/pion/webrtc/pull/2781
On v4.0.0-beta.13,
=== RUN TestSampleBuilderPacketUnreference
--- PASS: TestSampleBuilderPacketUnreference (0.05s)
all popped packets are unreferenced.
On v4.0.0-beta.14,
=== RUN TestSampleBuilderPacketUnreference
samplebuilder_test.go:570:
Error Trace: /home/at-wat/go/src/github.com/pion/webrtc/pkg/media/samplebuilder/samplebuilder_test.go:570
Error: Not equal:
expected: 1
actual : 65536
Test: TestSampleBuilderPacketUnreference
--- FAIL: TestSampleBuilderPacketUnreference (0.08s)
65536 packets aren't unreferenced.
Note
CPU and memory usage of my product software using SampleBuilder significantly increased after updating v4.0.0-beta.13 to v4.0.0-beta.19.
~I'll dig into the memory usage increase which doesn't appeared in the benchmark.~ Found the problem on input data unreference and updated above sections.
Refs
- https://github.com/pion/webrtc/pull/2679
@thatsnotright @Sean-Der FYI
Also, the new SampleBuilder doesn't unreference the input RTP packets and causes kind of memory leak. Created a PR to add a test to check that the input RTP packets are unreferenced after all samples are popped. https://github.com/pion/webrtc/pull/2781
@thatsnotright could you take a deeper look?
kindly ping @thatsnotright @Sean-Der
@at-wat gave a shot at investigating this one and I think I found at least one issue! See https://github.com/pion/interceptor/pull/264
Memory unreference problem seems be fixed, but CPU usage is still twice expensive then v4.0.0-beta.13
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor
BenchmarkSampleBuilderSequential-32 2007697 599.3 ns/op 420 B/op 6 allocs/op
BenchmarkSampleBuilderLoss-32 1627015 730.4 ns/op 373 B/op 5 allocs/op
BenchmarkSampleBuilderReordered-32 1930000 626.0 ns/op 420 B/op 6 allocs/op
BenchmarkSampleBuilderFragmented-32 2244157 527.6 ns/op 394 B/op 5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32 1774668 680.3 ns/op 346 B/op 4 allocs/op
PASS
ok github.com/pion/webrtc/v4/pkg/media/samplebuilder 9.425s
Memory unreference problem seems be fixed, but CPU usage is still twice expensive then
v4.0.0-beta.13goos: linux goarch: amd64 pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder cpu: AMD Ryzen 9 3950X 16-Core Processor BenchmarkSampleBuilderSequential-32 2007697 599.3 ns/op 420 B/op 6 allocs/op BenchmarkSampleBuilderLoss-32 1627015 730.4 ns/op 373 B/op 5 allocs/op BenchmarkSampleBuilderReordered-32 1930000 626.0 ns/op 420 B/op 6 allocs/op BenchmarkSampleBuilderFragmented-32 2244157 527.6 ns/op 394 B/op 5 allocs/op BenchmarkSampleBuilderFragmentedLoss-32 1774668 680.3 ns/op 346 B/op 4 allocs/op PASS ok github.com/pion/webrtc/v4/pkg/media/samplebuilder 9.425s
Whoops sorry! Referencing the issue and merging auto closed this
I tested v4.0.0-beta.27 in my product app again.
Memory usage is fixed, but CPU usage is bumped from 10% to 100% (as same as I tried v4.0.0-beta.14), so it seems still unusable for our case.
Should we revert that code for now?
Can you give me a week I can look into perf regression.
I don’t think we will ever fix it otherwise. No one is actively maintaining/improving this part
Sure!