OvenMediaEngine
OvenMediaEngine copied to clipboard
Resilient high quality video streaming
Purpose of this issue I want to get high quality video streaming working, with rather low latency and want to figure out a good generic configuration which works for a big audience / video streaming events with many concurrent listeners. That requires the video stream playback to be resilient and of high audio + video quality.
Approach to solve this issue I'd like to learn about what others found to work best for various use cases, and also I want to do my own tests to collect reproducible data. Then i want to use that data to improve the documentation and thus help the OvenMediaEngine project.
What to test
- compare behavior of OvenPlayer for WebRTC over UDP vs over TCP for various different network conditions
- compare effectiveness of Ulpfec and Rtx
- test for what cases a high playoutDelayHint will be helpful
- compare CPU and memory requirements of different configurations
- use Clumsy (https://github.com/IntouchHealth/clumsy) to get reliable and reproducible bad different network environments
What I learned so far
- When playing WebRTC over TCP (TURN), enabling BBR as tcp congestion control algorithm makes a huge difference in overall playback quality. (at least that was the result I got last weekend during a 26600 attendees event - after enabling bbr on all edge servers, the amounts of complaints reduced by "a lot"). You can temporarely enable bbr by executing
echo "bbr" > /proc/sys/net/ipv4/tcp_congestion_control
. - Ulpfec and Rtx can be disabled in environments where only WebRTC over TCP is used, see https://github.com/AirenSoft/OvenMediaEngine/issues/439
- Ulpfec and Rtx seem to be not effective for WebRTC over UDP in real world use cases, because it only protects against very rare occasional packet loss, but not against the typical bursts of packet loss lasting for a couple hundred milliseconds
- WebRTC over UDP does not result in good audio + video playback quality for some internet connections that have constant but somewhat high packet loss, however WebRTC over TCP might work perfectly fine for those.
- Some very fast internet connections have good network quality except for occasional but long bursts of packet loss (seems to be especially common for internet over old TV cable networks). For those, WebRTC over UDP results in good experienced audio+video output (except for short pauses when those bursts of packet loss happen) but using WebRTC over TCP results in the player then throwing a
provider.getState() === STATE_ERROR
, which completely stops playback. - OvenPlayer unfortunately does not "tell" you a reason, for why a
provider.getState() === STATE_ERROR
was reached - it is missing helpful debug output even when debug is set to true. - Of all the video delivery variants that OME supports, HLS is by far the most robust one in various network environments. However, OME does not support TCP keep-alive, thus HLS only is usable with a config of at least 3 chunks of 6 seconds each, resulting of typical playback latency of >18 seconds. The main reason is that on long distance links the high latency causes new established TCP connections to take a long time until they reach a decent transmission bitrate, but by then the next chunk with a completely new TCP connection has to be downloaded. (can be worked around by using nginx as reverse proxy in front of OME, or using any CDN provider. However affects total cost of operation)
- WebRTC over TCP requires roughly twice as much CPU resources than HLS on the edge servers. Thus HLS streaming using the origin-edge model of OME is the cheapest supported way of world-wide video distribution.
- OME LL-DASH is not usable except in local networks because of the missing tcp-keep-alive support. (can be worked around by using nginx as reverse proxy in front of OME, or using any CDN provider)
- WebRTC over UDP has "lost" frames as soon as there is a bit of typical (for long distance communication) network packet loss for example because of congested network provider peerings.
- In environments with mediocre/high typical packet loss amounts, the ice candidate gathering over UDP often fails without any matches, thus throwing OME in playback failure.
Other details
- Packets arriving out of order seems to not do much influence on WebRTC over UDP or TCP
- When the server receives occasional duplicate packets, OME logs
Failed to unprotect SRTCP packet, err=9
for each of those packets even at high logging level like warn or error. This might be a problem regarding denial of service attacks
Thanks for the great info. This is a great report that will help everyone. I will also refer to this report when prioritizing my next task.
@getroot this is not just intended as report. I specifically ask for how you/others tend to use OME in production. How do you deliver video and what does give you the best results?
@basisbit I know this isn't just for reporting. I pinned this thread because I want many people to share ideas about the operation of OvenMediaEngine and OS and Network in this thread.
I'll try to summarize my experience and comment soon. I hope that others will also share for all of us, even if it's not a special experience.
I have managed to get this wonderful system running and also managed to embed to my website the Player. Q: Can I play via VLC RTMP from OME or its just HLS and Dash?
@natzakaria OvenMediaEngine does not support RTMP streaming. It supports HLS or DASH, and it is recommended to play in HLS. DASH is still unstable.
@getroot
Thank you for the clarification - been trying all day yesterday and this morning rtmp and VLC returned errors. I am very happy with OME and managed to get the Player on my product site https://www.nz-live.net/live
The great feature is when clicking the gear icon - I can change the source from OME to select webrtc or HLS or even DASH - wonderful stuff
@natzakaria Your product is very nice! I wish your project success.
Thank you. OME just complete the puzzle.
Great work and thank you. Will try sending srt and other protocols
On Mon, Oct 25, 2021, 11:22 AM Jeheon Han @.***> wrote:
@natzakaria https://github.com/natzakaria Your product is very nice! I wish your project success.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AirenSoft/OvenMediaEngine/issues/449#issuecomment-950491857, or unsubscribe https://github.com/notifications/unsubscribe-auth/APQLUGDQJ2FGSUXIUIIB44TUITEOTANCNFSM5AYZUKJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
I recently provided technical support for a WebRTC-based live commerce service for mobile users. The highest number of concurrent viewers so far was 2000, and the video bitrates ranged from 4Mbps to 10Mbps (VBR). I used several Edges for this.
Mobile networks really vary in performance depending on the user's environment. And even when users watch at home, the Wifi signal varies depending on their location, which causes very high packet loss. We weren't too concerned because we were serving live with WebRTC/TCP, but we noticed one thing was missing.
TCP Congestion Control, which is set by default in Linux, is CUBIC. CUBIC is an algorithm for sizing TCP Window based on packet loss. That's why I found CUBIC not suitable for mobile environments. In a mobile environment where user speed is very high but packet loss is also high, CUBIC is a really bad choice. It exponentially slows down the network.
Check out this well-made PPT. https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-bbr-congestion-control-02.pdf
This really happened in my experiments as well. Even if there is only 0.1% packet loss, CUBIC significantly lowers Goodput to less than 10%.
I followed basisbit's advice and chose the BBR made by Google. This is the algorithm applied to YouTube as well. To apply BBR to Linux, refer to the URL below.
https://www.tecmint.com/increase-linux-server-internet-speed-with-tcp-bbr/
Finally I've heard that viewers have fewer complaints. Of course, if the viewer's network speed is really slow, they can't watch stream normally. This is because, unlike HLS, WebRTC instantaneously requires high bandwidth. Especially when transmitting in VBR. (I offered to serve a customer with CBR, but this was not possible due to the limitations of their encoder.)
And here's one really important tip. BBR is provided in Linux Kernel Version 4.9 and later. So you will probably have to upgrade your Kernel. Never select Kernel versions 5.3-5.6. This has a bug in which Epoll events are lost.
https://bugzilla.kernel.org/show_bug.cgi?id=205933
This can make OME's sessions stuck or reduce performance. I haven't slept in 3 days to find this problem.
This is probably an issue that doesn't happen with VMs in the cloud these days. However, if you are using a bare metal device and are experiencing packet drops or delayed transmission, it is recommended to try TSO disable. Many NICs often cause problems when handling TSO, and it is not possible to pinpoint which one. Disable TSO and see if your problem is solved.
https://www.ibm.com/docs/en/linux-on-systems?topic=offload-tcp-segmentation
We are updating the Troubleshooting documentation. In the meantime, we've updated some common issues we and other users have encountered, and we will continue to update them.
https://airensoft.gitbook.io/ovenmediaengine/troubleshooting
a small update on this issue. In the past couple of months, OvenMediaEngine became quite resilient for video delivery. For wold-wide video delivery, we currently use server locations Frankfurt Germany, Washington DC USA and Singapore. From there, we can quite reliably deliver WebRTC over UDP based video streams up to ~ 2 Mb/s bitrate. We did a dozen or so weekend long events with for example 35k unique viewers. There were a few cases, where UDP did stutter too much because of bursts of packet loss, but those users then just switched to WebRTC video over TCP and then were happy.
We tried higher bitrates up to 6Mb/s but there were quite a lot of cases where those streams were not reliable enough, even when switching to TCP and using BBR, although the users had for example Gigabit internet and were not using Wifi/WLAN. In most of those cases where we analyzed this in depth, it was caused by the internet connection being a DualStack Lite connection, and the CGNAT that they were going through for IPv4 were chronically congested. Apparently this is rather common for bigger ISPs like Vodafone, AT&T and especially also in Australia and Japan. (IPv6 supports in OME should eventually resolve this)
Most of our events were using video embedded in pixel-art games, so a low resolution and low bitrate were no problem, and for the other events we did fallback to HLS-only delivery using OvenMediaEngine -> Nginx (for http 1.1) -> BunnyCDN -> webbrowser of the user, and thus also supported IPv6.
@basisbit It's really surprising that a 2Mb/s stream over UDP works well for most people in a worldwide environment.
I never expected that 6Mb/s stream delivery in DualStack Lite environment or CGNAT would not be stable even with TCP. If congestion of DualStack Lite or CGNAT is the cause of instability, it seems that OME's IPv6 support should be raised as a top priority.
Thank you for sharing such valuable and important information.
If you have any documents or links about the congestion of DualStack Lite or CGNAT, it would be very helpful to me if you could share them.
Simulcast or ABR might be also helpful in these cases (and others). Stream distribution over IPv6 also comes with some difficulties because some big transit providers even nowadays don't properly support IPv6, which further limits choice of data centers in certain regions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
IPv6 has been released. See https://github.com/AirenSoft/OvenMediaEngine/issues/1044.