Sunshine feat: implement bidirectional microphone pass-through

Description

This PR implements complete bidirectional microphone support for the Sunshine streaming server, enabling Moonlight clients to send their microphone audio back to the server for output through the host's speakers/headphones.

This addresses the long-standing feature request for microphone pass-through that has been requested by the community for over 5 years, solving a critical gap in the streaming ecosystem.

Key Implementation Details:

Added new packet types (IDX_MIC_DATA, IDX_MIC_CONFIG) for microphone data transmission
Implemented dedicated microphone stream on port 12 (MIC_STREAM_PORT) for client-to-server audio
Created cross-platform audio output infrastructure with platform-specific implementations
Integrated RTSP protocol extensions for automatic microphone capability advertisement
Added comprehensive configuration options (enable_mic_passthrough, mic_sink)

Screenshot

N/A - This is a server-side protocol and audio infrastructure implementation without UI changes.

Issues Fixed or Closed

Resolves https://github.com/LizardByte/roadmap/issues/56

Type of Change

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] Dependency update (updates to dependencies)
[ ] Documentation update (changes to documentation)
[ ] Repository update (changes to repository files, e.g. .github/...)

Checklist

[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have added or updated the in code docstring/documentation-blocks for new or existing methods/components

Technical Implementation Details

Protocol Extensions

Extended stream.h with MIC_STREAM_PORT = 12
Added socket_e::microphone for dedicated mic socket handling
New packet types in protocol for microphone data and configuration

Audio Processing Pipeline

audio::mic_receive() function for processing incoming microphone packets
Opus decoder integration for real-time audio processing
mic_output_t interface for platform-specific audio output

Platform Support

Linux: mic_output_pa_t using PulseAudio for audio output
Windows: mic_output_wasapi_t using WASAPI for low-latency audio
macOS: av_mic_output_t using AVFoundation framework

Network Infrastructure

micReceiveThread() for UDP packet reception on port 12
Proper socket binding and management in broadcast context
Integration with existing session and thread management

Configuration

Add to Sunshine configuration:

enable_mic_passthrough=true
mic_sink=default  # or specify audio device name

Testing

The implementation has been validated for:

✅ Syntax and compilation compatibility
✅ Cross-platform code structure
✅ Integration with existing audio system
✅ Configuration parsing and validation
✅ Network infrastructure setup

Dependencies

No new external dependencies required. Uses existing:

Opus codec (already present for audio streaming)
Platform audio APIs (PulseAudio, WASAPI, AVFoundation)
Existing network and threading infrastructure

Notes for Reviewers

This is a server-side implementation. Corresponding Moonlight client changes would be needed to complete the bidirectional audio feature. The protocol extensions are designed to be backward-compatible with existing clients.

The implementation follows existing Sunshine patterns for:

Configuration management (config.h/config.cpp)
Platform abstraction (platform/common.h)
Network protocols (stream.cpp, rtsp.cpp)
Audio processing (audio.h/audio.cpp)

Breaking Changes

None. This feature is entirely additive and disabled by default.

Author: [email protected]

Jul 15 '25 01:07 cardoza1991

@cardoza1991 thank you for the PR! There has been a lot of talk about this feature as of late.

Would you mind editing the PR body to use our template? You can get the original template from here: https://github.com/LizardByte/.github/blob/master/.github/pull_request_template.md?plain=1

Jul 15 '25 02:07 ReenigneArcher

I think the approach is generally good, but I don't think we need any changes to the control stream. I think we should do all the configuration via RTSP/SDP. The server can advertise mic support via SDP like you're doing here. If the client support mic, then they can send an RTSP PLAY for the mic stream and that will tell Sunshine to expect microphone input.

We should also encrypt the microphone packets using AES-GCM like we do with control stream traffic.

Jul 15 '25 02:07 cgutman

My 2 cents regarding the protocol:

Encryption should be (optionally) supported
FEC should be (optionally) supported, or just sending duplicate UDP packets spread around in time I guess
Multi-client streaming should be supported, e.g. client-identifying packet header outside of encrypted payload
Mic packet's header+payload should be sufficiently different from ping packets (which are 20 bytes in length). The motivation is to make ping port capable of accepting mic packets too. Currently moonlight/sunshine protocol requires only 2 port numbers to operate in full capacity, and I will be extremely thankful if it stays that way.

Jul 15 '25 10:07 ns6089

I think we should do all the configuration via RTSP/SDP. The server can advertise mic support via SDP like you're doing here. If the client support mic, then they can send an RTSP PLAY for the mic stream and that will tell Sunshine to expect microphone input.

I believe midstream mic hotplug is a thing that should be supported (at least on the protocol level), and this can be implemented either through Control or Encrypted RTSP. But doing it through Control is probably easier.

Jul 15 '25 11:07 ns6089

@ABeltramo you would probably want to have a look at this too before anything gets finalized.

Jul 15 '25 11:07 ns6089

Thanks for the ping @ns6089 I agree with most of what has been said so far.

I think the protocol should be reversed though: a client advertises for a ~~microphone~~ generic audio input source, and we create the correct audio sink that matches the requested bitrate+channels on the host. Why would it be hardcoded and advertised from the server? This doesn't feel right https://github.com/cardoza1991/Sunshine/blob/9f8dd8d0d88d76f962daea2a8b054c7e2eed9653/src/rtsp.cpp#L759-L766 why hard-coding some values like that?

Also, I wouldn't make the mistake of assuming a single global microphone stream.
Since we have the freedom to create this from scratch, let's support multi-users and multiple audio input devices (it doesn't have to be strictly just a microphone!) right from the start. If we put an identifier in the control packet header, we don't even need multiple ports for different input streams.

Really excited for this, thanks @cardoza1991 to get the ball rolling!

Jul 15 '25 18:07 ABeltramo

yeah well I figured I'd tackle a 5 year PR request so here it is. Thnks for the feedback

Jul 15 '25 22:07 cardoza1991

Data flow can probably be like this:

During RTSP. Client announces support for generic mic pass-through and whether it wants mic encryption. Server assigns and gives client some session token (used for packet identification later on). Port number for incoming mic packets is also shared here. So is whether or not server accepted the request for encryption.
During stream, in Control. Client announces mic creation, with a number unique to this client and some channel format.
Client begins sending packets (to the port announced in RTSP). Each packet contains session token (provided during RTSP), mic number unique to this client, packet counter for this mic, and audio payload. Audio payload is encrypted with AES-GCM (if both sides agreed on supporting encryption during RTSP), this particular encryption algorithm also acts as a validator and protects from malicious packets.
Optionally during stream, in Control. Client can announce mic destruction, for the particular mic number.

Communication in Control is kept intentionally unidirectional because it's painful to read async replies from it.

Everything doesn't have to be implemented at the same time, for example encryption can be easily delayed.

Jul 16 '25 10:07 ns6089

How can I contribute to this PR? Do I need to submit changes to cardoza1991's repo then have it moved here?

Jul 16 '25 17:07 itsmikethetech

Thanks for the ping @ns6089 I agree with most of what has been said so far.

I think the protocol should be reversed though: a client advertises for a ~microphone~ generic audio input source, and we create the correct audio sink that matches the requested bitrate+channels on the host. Why would it be hardcoded and advertised from the server? This doesn't feel right https://github.com/cardoza1991/Sunshine/blob/9f8dd8d0d88d76f962daea2a8b054c7e2eed9653/src/rtsp.cpp#L759-L766 why hard-coding some values like that?

Also, I wouldn't make the mistake of assuming a single global microphone stream. Since we have the freedom to create this from scratch, let's support multi-users and multiple audio input devices (it doesn't have to be strictly just a microphone!) right from the start. If we put an identifier in the control packet header, we don't even need multiple ports for different input streams.

Really excited for this, thanks @cardoza1991 to get the ball rolling!

I threw in audio support for windows, Mac’s audio stuff and Linux. Honestly my primary focus should have just been Linux since me and the boys are streaming games with our servers.

Jul 16 '25 17:07 cardoza1991

How can I contribute to this PR? Do I need to submit changes to cardoza1991's repo then have it moved here?

If you submit a PR to this branch (https://github.com/cardoza1991/Sunshine/tree/feature/bidirectional-microphone-passthrough) and it gets accepted and merged, then it would be included in this PR.

If the changes you want are simple, it might be better to just do a review here (https://github.com/LizardByte/Sunshine/pull/4078/files)

Jul 16 '25 18:07 ReenigneArcher

I think the approach is generally good, but I don't think we need any changes to the control stream. I think we should do all the configuration via RTSP/SDP. The server can advertise mic support via SDP like you're doing here. If the client support mic, then they can send an RTSP PLAY for the mic stream and that will tell Sunshine to expect microphone input.

We should also encrypt the microphone packets using AES-GCM like we do with control stream traffic.

Awesome thanks for this

Jul 17 '25 17:07 cardoza1991

Quality Gate failed

Failed conditions
34 New issues
D Reliability Rating on New Code (required ≥ A)
2 New Bugs (required ≤ 0)
32 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Jul 27 '25 04:07 sonarqubecloud[bot]

@cardoza1991 question are you still working on this or is this PR stale?

Oct 13 '25 09:10 MNarath1

Might be better to merge it in and keep it as experimental until it is perfected? It has been a highly requested feature for 5 years.

Nov 02 '25 09:11 wagneramichael

Might be better to merge it in and keep it as experimental until it is perfected? It has been a highly requested feature for 5 years.

I am asking cause i am considering to take a spin on this if they are no longer working on it

Nov 04 '25 13:11 MNarath1

@MNarath1 it appears stale to me.

Nov 04 '25 13:11 ReenigneArcher

Is this topic dead again?

Nov 10 '25 15:11 GerdW