webrtc icon indicating copy to clipboard operation
webrtc copied to clipboard

Performance regression and memory leak in SampleBuilder@v4

Open at-wat opened this issue 1 year ago • 11 comments

Your environment.

  • Version: v4.0.0-beta.14 and later

What did you do?

Updated pion/webrtc from v4.0.0-beta.13 to later version.

What did you expect?

SampleBuilder CPU/memory usage doesn't significantly increased.

What happened?

CPU usage is doubled in the existing benchmark, and input RTP packets aren't unreferenced after the corresponding samples are popped.

CPU

Bench name v4.0.0-beta.13 CPU v4.0.0-beta.14 CPU CPU usage increase
BenchmarkSampleBuilderSequential-32 311.4 ns/op 609.6 ns/op x1.96
BenchmarkSampleBuilderLoss-32 288.7 ns/op 720.5 ns/op x2.50
BenchmarkSampleBuilderReordered-32 320.9 ns/op 621.9 ns/op x1.93
BenchmarkSampleBuilderFragmented-32 272.8 ns/op 540.5 ns/op x1.98
BenchmarkSampleBuilderFragmentedLoss-32 246.8 ns/op 668.2 ns/op x2.71
Full benchmark outputs

v4.0.0-beta.13

$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 3831741	       311.4 ns/op	     388 B/op	       5 allocs/op
BenchmarkSampleBuilderLoss-32              	 3969078	       288.7 ns/op	     344 B/op	       4 allocs/op
BenchmarkSampleBuilderReordered-32         	 3697982	       320.9 ns/op	     388 B/op	       4 allocs/op
BenchmarkSampleBuilderFragmented-32        	 4363172	       272.8 ns/op	     362 B/op	       4 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 4807190	       246.8 ns/op	     317 B/op	       3 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	7.556s

v4.0.0-beta.14

$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 1979661	       609.6 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderLoss-32              	 1647154	       720.5 ns/op	     373 B/op	       5 allocs/op
BenchmarkSampleBuilderReordered-32         	 1881794	       621.9 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderFragmented-32        	 2221912	       540.5 ns/op	     394 B/op	       5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 1816394	       668.2 ns/op	     346 B/op	       4 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	9.402s

v4.0.0-beta.19

$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 1978306	       602.4 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderLoss-32              	 1609996	       740.5 ns/op	     373 B/op	       5 allocs/op
BenchmarkSampleBuilderReordered-32         	 1897911	       628.1 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderFragmented-32        	 2179366	       538.9 ns/op	     394 B/op	       5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 1744249	       684.3 ns/op	     346 B/op	       4 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	9.426s

Memory

I created a test to check that the input RTP packets are unreferenced after all samples are popped.

  • https://github.com/pion/webrtc/pull/2781

On v4.0.0-beta.13,

=== RUN   TestSampleBuilderPacketUnreference
--- PASS: TestSampleBuilderPacketUnreference (0.05s)

all popped packets are unreferenced.

On v4.0.0-beta.14,

=== RUN   TestSampleBuilderPacketUnreference
    samplebuilder_test.go:570: 
        	Error Trace:	/home/at-wat/go/src/github.com/pion/webrtc/pkg/media/samplebuilder/samplebuilder_test.go:570
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 65536
        	Test:       	TestSampleBuilderPacketUnreference
--- FAIL: TestSampleBuilderPacketUnreference (0.08s)

65536 packets aren't unreferenced.

Note

CPU and memory usage of my product software using SampleBuilder significantly increased after updating v4.0.0-beta.13 to v4.0.0-beta.19. ~I'll dig into the memory usage increase which doesn't appeared in the benchmark.~ Found the problem on input data unreference and updated above sections.

Refs

  • https://github.com/pion/webrtc/pull/2679

at-wat avatar May 30 '24 09:05 at-wat

@thatsnotright @Sean-Der FYI

at-wat avatar May 30 '24 09:05 at-wat

Also, the new SampleBuilder doesn't unreference the input RTP packets and causes kind of memory leak. Created a PR to add a test to check that the input RTP packets are unreferenced after all samples are popped. https://github.com/pion/webrtc/pull/2781

at-wat avatar May 31 '24 07:05 at-wat

@thatsnotright could you take a deeper look?

at-wat avatar May 31 '24 07:05 at-wat

kindly ping @thatsnotright @Sean-Der

at-wat avatar Jul 01 '24 19:07 at-wat

@at-wat gave a shot at investigating this one and I think I found at least one issue! See https://github.com/pion/interceptor/pull/264

edaniels avatar Jul 23 '24 03:07 edaniels

Memory unreference problem seems be fixed, but CPU usage is still twice expensive then v4.0.0-beta.13

goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 2007697	       599.3 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderLoss-32              	 1627015	       730.4 ns/op	     373 B/op	       5 allocs/op
BenchmarkSampleBuilderReordered-32         	 1930000	       626.0 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderFragmented-32        	 2244157	       527.6 ns/op	     394 B/op	       5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 1774668	       680.3 ns/op	     346 B/op	       4 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	9.425s

at-wat avatar Aug 07 '24 03:08 at-wat

Memory unreference problem seems be fixed, but CPU usage is still twice expensive then v4.0.0-beta.13

goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 2007697	       599.3 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderLoss-32              	 1627015	       730.4 ns/op	     373 B/op	       5 allocs/op
BenchmarkSampleBuilderReordered-32         	 1930000	       626.0 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderFragmented-32        	 2244157	       527.6 ns/op	     394 B/op	       5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 1774668	       680.3 ns/op	     346 B/op	       4 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	9.425s

Whoops sorry! Referencing the issue and merging auto closed this

edaniels avatar Aug 07 '24 04:08 edaniels

I tested v4.0.0-beta.27 in my product app again. Memory usage is fixed, but CPU usage is bumped from 10% to 100% (as same as I tried v4.0.0-beta.14), so it seems still unusable for our case.

at-wat avatar Aug 07 '24 05:08 at-wat

Should we revert that code for now?

edaniels avatar Aug 07 '24 11:08 edaniels

Can you give me a week I can look into perf regression.

I don’t think we will ever fix it otherwise. No one is actively maintaining/improving this part

Sean-Der avatar Aug 07 '24 11:08 Sean-Der

Sure!

edaniels avatar Aug 07 '24 11:08 edaniels