webrtc icon indicating copy to clipboard operation
webrtc copied to clipboard

[WebRTC] investigate data-channels-flow-control example throughput performance issue

Open rainliu opened this issue 4 years ago • 26 comments

Pion has more than 500Mbps+,

Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
2021/09/25 13:27:29 OnOpen: data-824638619994. Start sending a series of 1024-byte packets as fast as it can
2021/09/25 13:27:29 OnOpen: data-824636958938. Start receiving data
2021/09/25 13:27:30 Throughput: 570.646 Mbps
2021/09/25 13:27:31 Throughput: 569.753 Mbps
2021/09/25 13:27:32 Throughput: 573.001 Mbps
2021/09/25 13:27:33 Throughput: 572.452 Mbps
2021/09/25 13:27:34 Throughput: 571.297 Mbps
2021/09/25 13:27:35 Throughput: 569.525 Mbps
2021/09/25 13:27:36 Throughput: 567.463 Mbps
...

but, webrtc-rs only has around 13Mbps:

Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
OnOpen: data-1. Start sending a series of 1024-byte packets as fast as it can
OnOpen: data-1. Start receiving data
Throughput: 12.990 Mbps
Throughput: 13.698 Mbps
Throughput: 13.559 Mbps
Throughput: 13.345 Mbps
Throughput: 13.565 Mbps
Throughput: 13.582 Mbps

rainliu avatar Sep 25 '21 20:09 rainliu

cargo build --release --example data-channels-flow-control increases performance, but not comparable to pion.

./target/release/examples/data-channels-flow-control Press ctlr-c to stop Peer Connection State has changed: connected (offerer) Peer Connection State has changed: connected (answerer) OnOpen: data-1. Start sending a series of 1024-byte packets as fast as it can OnOpen: data-1. Start receiving data Throughput: 175.556 Mbps Throughput: 106.104 Mbps Throughput: 76.986 Mbps Throughput: 61.450 Mbps Throughput: 51.632 Mbps Throughput: 44.797 Mbps Throughput: 39.733 Mbps Throughput: 35.619 Mbps Throughput: 32.330 Mbps Throughput: 29.491 Mbps Throughput: 43.142 Mbps Throughput: 48.350 Mbps Throughput: 46.386 Mbps Throughput: 44.221 Mbps Throughput: 48.071 Mbps Throughput: 55.550 Mbps Throughput: 53.980 Mbps Throughput: 52.263 Mbps

rainliu avatar Sep 25 '21 20:09 rainliu

should be tokio performance limit

some other benchmark

goroutines: 3.22234675s total, 3.222346ms avg per iteration rust_threads: 16.980509645s total, 16.980509ms avg per iteration rust_tokio: 9.56997204s total, 9.569972ms avg per iteration rust_tokio_block_in_place: 3.578928749s total, 3.578928ms avg per iteration

https://www.reddit.com/r/rust/comments/lg0a7b/benchmarking_tokio_tasks_and_goroutines/

whans avatar Oct 05 '21 04:10 whans

@rainliu I made some benchmarks between Go(Pion) and Rust with long running time, maybe will help.


Go(Pion) From start of benchmark throughput was grow up to 844Mps

17:49:06 Throughput: 721.371 Mbps
17:49:07 Throughput: 727.991 Mbps
17:49:08 Throughput: 743.665 Mbps
...
17:49:39 Throughput: 842.728 Mbps
17:49:40 Throughput: 843.339 Mbps
17:49:41 Throughput: 843.672 Mbps
17:49:42 Throughput: 844.272 Mbps
17:49:43 Throughput: 844.782 Mbps
17:49:44 Throughput: 844.855 Mbps

after has been throwed an exception mux ERROR: 17:49:45 mux: ending readLoop dispatch error packetio.Buffer is full, discarding write

and throughput started to go slow down without stop to stable point. I stopped benchmark on

18:44:45 Throughput: 9.966 Mbps

Rust From start of benchmark throughput was

Throughput: 229.521 Mbps
Throughput: 231.489 Mbps
Throughput: 231.780 Mbps
Throughput: 231.662 Mbps
Throughput: 231.965 Mbps

after that, started to go down and reached lowest point

Throughput: 23.023 Mbps
Throughput: 22.849 Mbps
Throughput: 22.677 Mbps

after started slowly to grow up and balancing between 66.436 Mbps - 41.511 Mbps


CPU/RAM

Go

RAM 22.304 MB
CPU ~1%

Rust

RAM 162MB(stopped at that point, without recycling)
CPU ~120%

vitdevelop avatar Oct 05 '21 15:10 vitdevelop

should be tokio performance limit

some other benchmark

goroutines: 3.22234675s total, 3.222346ms avg per iteration rust_threads: 16.980509645s total, 16.980509ms avg per iteration rust_tokio: 9.56997204s total, 9.569972ms avg per iteration rust_tokio_block_in_place: 3.578928749s total, 3.578928ms avg per iteration

https://www.reddit.com/r/rust/comments/lg0a7b/benchmarking_tokio_tasks_and_goroutines/

@whans Benchmarks was made on I/O for file, not on socket. Linux have different behavior for files and sockets. If you will try std's file IO inside async block or task::block_in_place you will have very fast values. That is because Linux use read-ahead for files. Also benchmarks was made on /dev/urandom and /dev/null which is in-memory files.

@rainliu I suppose that, here example use socket connection.

vitdevelop avatar Oct 05 '21 15:10 vitdevelop

@vitdevelop thanks compare std::net::UdpSocket vs tokio::net::UdpSocket std::UdpSocket is almost twice as fast as tokio::UdpSocket

whans avatar Oct 05 '21 23:10 whans

thanks @vitdevelop and @whans for the benchmarking.

Look like we need some efforts to profile the hotspots/bottlenecks.

rainliu avatar Oct 06 '21 03:10 rainliu

add tokio console to check the schedule issue

https://github.com/tokio-rs/console

whans avatar Oct 06 '21 10:10 whans

add tokio console to check the schedule issue

https://github.com/tokio-rs/console

@whans thanks for tokio-console, awesome tool

I tried to check with tokio-console busy/idle times for tasks but didn't connect to console_subscriber. After I figured out that before offer/answer I can connect and I putted some tokio::time::sleep points to see the image, after offer/answer started to execute, console hangs up.

  • 10 sec before create_oferer
  • 3 sec before create_answerer
  • 3 sec before create_offer and set_remote_description
  • 3 sec before create_answer and set_remote_description

Here is last tokio-console data webrtc_rs_tokio_console

vitdevelop avatar Oct 06 '21 14:10 vitdevelop

@vitdevelop you need to slow down the packet sending rate. add sleep in sending task

whans avatar Oct 07 '21 08:10 whans

@whans Sleep in sending task helped for some time

@rainliu After ~5 min I figured out that a lot of tasks webrtc-sctp-0.3.8/src/timer/[ack,rtx]_timer.rs:[43,153] was spawned, around 500-600. Most cpu busy time is in that tasks.

I attached the screenshot. webrtc_rs_ack_rtc_timer

vitdevelop avatar Oct 08 '21 10:10 vitdevelop

@vitdevelop, thanks for the finding. Could you submit a PR to add tokio-console/console_subscriber to data-channels-flow-control example? so, I can take a look.

rainliu avatar Oct 09 '21 05:10 rainliu

@rainliu Added PR webrtc-rs/examples#1

vitdevelop avatar Oct 10 '21 08:10 vitdevelop

output about: perf top -p

Xnip2021-10-11_08-19-51

whans avatar Oct 11 '21 00:10 whans

I was doing some testing of my own on sctp and on just write to streams it takes almost 22x slower compared to same example on pion side. I was seeing an issue to use IO-free state machine style, while that will speed up the protocol it still doesnt explain the root cause. The rust example shouldn't be 22x slower it should be atleast comparable to the pion one if not faster. Has anyone explored the root cause of such slow down of the protocol?

ramyak-mehra avatar Mar 29 '24 20:03 ramyak-mehra