spreed Adaptive Audio/Video Quality Engine to Improve Reliability in Variable Network Environments

[Feature Proposal] Adaptive Audio/Video Quality Engine to Improve Reliability in Variable Network Environments

👋 Hi Nextcloud Talk team,

First of all, thank you for your exceptional work on making Talk a secure, privacy-first communication tool. The continual evolution of its features—from whiteboards and breakout rooms to mobile refinements and SIP integration—shows real care for both technical depth and usability.

After using Talk extensively in real-world deployments, including family, educational, and NGO setups with variable-quality networks (e.g. rural Sweden, refugee services), one critical challenge continues to affect usability:

🛑 The Challenge: Fixed Media Quality Despite Network Changes

Talk currently maintains a static media bitrate/profile during calls. It does not adapt when a participant's network fluctuates (e.g. increased RTT, packet loss, jitter, or mobile network handoffs). This leads to:

Stuttering video/audio with no graceful fallback
Disconnections instead of smooth degradation
Reliance on manual user intervention (e.g. asking to turn off video)

✅ Proposed Feature: Adaptive Quality Engine for WebRTC

Introduce an optional, client-driven mechanism to monitor live network conditions and adjust media profiles accordingly.

📊 Key Metrics:

RTT (Round Trip Time)
Packet Loss
Jitter
Estimated Bandwidth via getStats() API

📉 Quality Profiles:

FullHD → 720p → 360p (video)
128kbps → 64kbps → Mono (Opus audio)

⚙️ Implementation Suggestions:

Admin toggle: Enable adaptive media quality
Use VP9-SVC or AV1-SVC if supported for scalable encoding
Utilize Opus codec's flexibility to adapt audio
Visual indicator (e.g. green/yellow/red dots based on quality)
Optional logging of quality transitions for diagnostics

🤖 Why It Matters:

This would drastically improve reliability and inclusivity:

Schools with poor infrastructure
Mobile workers on 4G/5G or public Wi-Fi
Remote/self-hosted networks over VPN

When media quality fails gracefully, user trust increases dramatically.

🧪 Minimal Backend Changes

Phase 1: Fully implementable in frontend
Advanced signaling/control can be explored later

🔒 Privacy Alignment

All metrics are ephemeral, from WebRTC APIs
No user data is transmitted or stored

💡 Context & Previous Discussion

I previously opened issue #15257, which was closed after a brief interpretation that the matter was already handled by WebRTC.

However, this current proposal:

Clearly differentiates between initial WebRTC connection setup and ongoing dynamic quality control.
Provides a complete technical and UX rationale, not previously included.
Shows the gap between theoretical WebRTC adaptability and practical behavior within Talk.

This new issue is therefore not a duplicate, but a more focused and implementable redesign, informed by real-world use.

💬 Final Thoughts

This feature, while subtle in UI, can elevate Talk’s reliability significantly, aligning it with modern tools like Zoom or Meet—while keeping full control in user hands.

I'm happy to assist with draft specifications (RFC-style), testing, or feedback if desired.

Thanks again for your fantastic work 🙏

Warm regards,
M Rachid Halabia

Jun 09 '25 00:06 MohammadRasheed

Thanks for your kind words and your proposal.

I would like to provide some context about this subject. When the HPB is not used a direct peer to peer WebRTC connection is established between each participant. Both ends of the connection provide feedback to each other regarding the quality of the connection, and if the connection quality is not good enough to sustain the current media quality it is automatically reduced. This is done, for example, by reducing the bitrate of the sent video or its resolution. Therefore, if no HPB is used, the browsers (and most likely the mobile clients too thanks to their WebRTC library, although I have not checked it) already automatically adjust the media quality to the network conditions.

If the HPB is used then there are no direct peer to peer WebRTC connections between the participants, and the peer to peer WebRTC connections are established instead with the HPB. Each participant has one publisher connection to send media to the HPB, and as many subscriber connections as other participants to receive media from the HPB.

In that case the media quality of the publisher connections is also automatically adjusted to the network conditions between the publisher and the HPB, so the quality is reduced if the publisher client does not have enough bandwidth. However, this is not the case for subscriber connections. The HPB (through Janus, its WebRTC gateway) distributes the publisher connection to all the subscribers, but it does not reencode it or anything. The same media that it receives is then sent to the subcribers, but it can not modify the media quality as it is not encoding it. Moreover, as the same media is distributed to all the subscribers the original media can not be adjusted based on the network condition of the subscribers either, as each one could have different conditions.

Nevertheless, there would be an option to reduce the quality received by the subscribers. With the HPB, and if the clients support it, simulcast is used. Rather than a single media stream the publishers send three different media streams, each one with different qualities, and the subscribers ask for a specific quality depending on what will be done with the video, so large quality is used for the speaker view, but low quality is used for videos in the bottom video stripe, for example.

While it could be possible to switch to a lower quality stream also if connection problems are detected (which is also quite challenging) the problem would be to increase it again once the connection improved, as that would require "probing" (essentially, checking that the network is now able to transfer that higher quality media). All this is explained in this interesting Janus blog: https://www.meetecho.com/blog/bwe-janus/

As explained in that blog the automatic switch between simulcast streams based on the connection quality is something being worked on in Janus. Given the complexities of that, and also the limitations of doing it from the clients, from my point of view it would be better to rely on the future Janus implementation (even if that means waiting and not having a good experience for now in some scenarios) rather than spending time trying to come up with a Talk specific implementation that would be thrown away once the Janus one is ready.

Finally, it is also worth noting that when the HPB is used if a received video is either not visible or explicitly disabled the subscriber connection will not receive (almost) any data, so in case of low bandwidth scenarios manually disabling the received videos should help. Of course it would be nice to do it automatically, even if the simulcast stream switch is implemented in Janus, but deciding when to disable the video based on the connection stats is quite tricky (as false positives would negatively affect the call experience), and there is again the problem of knowing when/if to enable a video again after automatically disabling it.

Jun 17 '25 05:06 danxuliu

Thank you very much for this detailed and technically clear explanation, and for shedding light on how HPB and Janus handle media distribution.

You're absolutely right that Janus currently does not reencode media for subscribers, and therefore cannot tailor the stream to each participant’s network conditions. The overview you provided makes this very clear.

However, I would like to suggest a lightweight client-side workaround that could serve as a practical interim solution until full dynamic stream switching via Janus becomes available:

The subscriber clients can monitor WebRTC stats (e.g., RTCIceCandidateStats or RTCInboundRTPStreamStats) to detect when conditions degrade (e.g., high packet loss, jitter, RTT).
Upon hitting a defined threshold, the client could:
- Request the lowest simulcast layer (if not already in use).
- Or, automatically disable/mute the received video stream, possibly displaying a notice ("Low connection – video paused").

This does not require server-side logic or Janus modification. It simply offers a client-side fallback that prevents full video stalling and can significantly improve user experience—especially for non-technical users on weak networks.

I fully understand your concern about avoiding short-lived implementations, but I believe this could be framed as a temporary "subscriber-side degradation policy" pending native probing-based switching in Janus.

I'd be happy to test or help spec this further if there's interest.

Warm regards,
M Rachid Halabia

Jun 17 '25 17:06 MohammadRasheed