obs-studio libobs: Fix duplicate audio bug for monitored sources

Description

This fixes the following bug:

increased audio level when the monitoring device is also captured with an Audio Output Capture source (AOC, such as Desktop Audio, wasapi_output_capture on windows or pulse_audio_capture on linux). The level increase is variable and depends on the source. The output capture source will introduce a phase shift, which depends in general on the frequency, due to the processing delay on the OS output side. For most audio sources, the original monitored source and the delayed copy coming from the AOC will be added incoherently so that one observes a +3 dBFS level increase (ex: white noise, most music); but for low-frequency tones, the level can increase up to +6 dBFS (coherent constructive interference).

According to our UI commander-in-chief (aka @Warchamp7 ), this has been the biggest block for adding a monitoring button in the audio meters of obs mixer. Indeed, while the level variation is in general just +3 dBFS, this bug implies that any recording or stream can have unwanted level variations. Imagine the scenario where there's a panel with several independent speakers with their own mikes + a game audio captured through an AOC. If the operator monitors in turn each of the speakers, during the monitoring, their levels will be boosted. Since this happens at the final mix, this can't be solved by adding a compressor on each source. This makes a monitoring button quite dangerous in terms of the quality of the audio mix, because one has 0 feedback on the prod side on the real audio level of the output. The monitoring itself won't detect the feedback loop.

Motivation and Context

The bug is usually dealt with by users by either getting a device with enough separate outputs or by doing complex routing with virtual cables; or on windows by using the dedicated Application Audio Capture (which is still in beta).

We fix the bug in this manner:

on the UI side, we store in OBSBasic class a vector of all AOC sources (including global sources such as Desktop Audio); we mark any source whose device is identical with the monitoring device.
if an AOC and monitoring do use the same device, core audio silences the audio output buffers of all sources which have monitoring_type MONITOR_AND_OUPUT.
if 'audio output capture' source is muted, the monitored sources are not silenced any more.

How Has This Been Tested?

This has been tested extensively on windows 11 pro 23H2 by myself using:

a sine tone at 1000 Hz, generated by lavfi filter in Media Source, of amplitude 1/8, which means -21.1 dBFS.
the music from a splatoon clip. The audio level of the sine was checked using the FFmpeg volumedetect filter: ffmpeg -i '.\input.mp4' -filter:a volumedetect -f null /dev/null Recordings were made in mp4 container with 16 bit pcm raw audio. The waveforms were checked and compared between OBS 31.0.3 (current release) and this PR in Reaper DAW.

@Warchamp7 did also extensive tests which helped me dodge a few dangerous bullets (crashes and what not). Thanks a lot for his help !

Profiling results

I profiled the audio_callback on current master versus PR. There is an impact in the 0.01 ms range, but compared with a tick duration (20 ms) it's completely negligible.

no output( no recording nor streaming:

obs master:

audio_thread(Audio): min=0.002 ms, median=0.018 ms, max=0.422 ms, 99th percentile=0.09 ms
 ┗audio_callback: min=0 ms, median=0.012 ms, max=0.411 ms, 99th percentile=0.08 ms

this PR:

audio_thread(Audio): min=0.001 ms, median=0.031 ms, max=0.248 ms, 99th percentile=0.073 ms
 ┣audio_callback: min=0 ms, median=0.016 ms, max=0.11 ms, 99th percentile=0.04 ms
 ┗receive_audio: min=0.001 ms, median=0.016 ms, max=0.224 ms, 99th percentile=0.044 ms, 0.428969 calls per parent call
   ┣buffer_audio: min=0 ms, median=0 ms, max=0.038 ms, 99th percentile=0.003 ms
   ┗do_encode: min=0.002 ms, median=0.015 ms, max=0.224 ms, 99th percentile=0.043 ms
     ┣encode(Track1): min=0.001 ms, median=0.005 ms, max=0.033 ms, 99th percentile=0.01 ms

after a 5 seconds recording:

obs master:

audio_thread(Audio): min=0.001 ms, median=0.031 ms, max=0.248 ms, 99th percentile=0.073 ms
 ┣audio_callback: min=0 ms, median=0.016 ms, max=0.11 ms, 99th percentile=0.04 ms
 ┗receive_audio: min=0.001 ms, median=0.016 ms, max=0.224 ms, 99th percentile=0.044 ms, 0.428969 calls per parent call
   ┣buffer_audio: min=0 ms, median=0 ms, max=0.038 ms, 99th percentile=0.003 ms
   ┗do_encode: min=0.002 ms, median=0.015 ms, max=0.224 ms, 99th percentile=0.043 ms
     ┣encode(Track1): min=0.001 ms, median=0.005 ms, max=0.033 ms, 99th percentile=0.01 ms

this PR:

 audio_thread(Audio): min=0.001 ms, median=0.036 ms, max=0.568 ms, 99th percentile=0.098 ms
 ┣audio_callback: min=0 ms, median=0.024 ms, max=0.3 ms, 99th percentile=0.072 ms
 ┗receive_audio: min=0.001 ms, median=0.017 ms, max=0.325 ms, 99th percentile=0.23 ms, 0.367391 calls per parent call
   ┣buffer_audio: min=0 ms, median=0 ms, max=0.003 ms, 99th percentile=0.001 ms
   ┗do_encode: min=0.001 ms, median=0.016 ms, max=0.322 ms, 99th percentile=0.229 ms
     ┣encode(Track1): min=0.001 ms, median=0.005 ms, max=0.308 ms, 99th percentile=0.011 ms
     ┗send_packet: min=0 ms, median=0.01 ms, max=0.252 ms, 99th percentile=0.045 ms

Tests

The following tests were performed for Bug 1 (Desktop Audio duplication of monitored sources / same results for a regular Audio Output Capture source):

Scene with monitored sine source + Desktop Audio (monitoring device set at 100% level on the pc):

left: obs current release: level is -15.4 dBFS or +5.7 dBFS above the expected level.
right: this PR: level stays at nominal -21.1 dBFS

dup1

same scene with mute/unmute on the Desktop Audio:

top: obs current release, one sees the +5.7 dBFS level variation.
bottom: this PR. The audio level is undisturbed.

same scene with monitoring on and off on sine source:

top: obs current release, one sees the +5.7 dBFS level variation.
bottom: this PR. The audio level is undisturbed.

scene with a Media Source clip (splatoon) of 5 seconds looped 4 times, monitored + Desktop Audio at 100% level on pc:

on 2nd and 3rd loop, the Desktop Audio is muted;
top: obs current release: level is higher on 1st and 4th loops.
bottom: this PR: level is the same for all 4 loops (identical waveforms).

Types of changes

Bug fix (non-breaking change which fixes an issue)

Checklist:

[x] My code has been run through clang-format.
[x] I have read the contributing document.
[x] My code is not on the master branch.
[x] The code has been tested.
[x] All commit messages are properly formatted and commits squashed where appropriate.
[x] I have included updates to all appropriate documentation.

May 20 '25 13:05 pkviet

on the UI side, we detect whenever the same device is used for monitoring and Desktop Audio (or Desktop Audio 2).

Does this imply this fix only works for the two global Desktop Audio sources, and not if I add an Audio Output Capture source to a scene myself, along with monitored sources?

May 20 '25 13:05 CyBeRoni

on the UI side, we detect whenever the same device is used for monitoring and Desktop Audio (or Desktop Audio 2).

Does this imply this fix only works for the two global Desktop Audio sources, and not if I add an Audio Output Capture source to a scene myself, along with monitored sources?

yes it works only for the global Desktop Audio. tbf I didn't consider Audio Output Capture source. I'll think about it.

May 20 '25 14:05 pkviet

Update Added a test for a regular fade transition between two scenes when there is no duplication; checked that the new code path is never hit.

May 25 '25 11:05 pkviet

Update Updated the PR to support monitoring deduplication with both global 'Desktop Audio' sources and 'Audio_Output_Capture' sources (wasapi and pulse). Removed all UI code in favour of fully processing deduplication of monitored sources in core audio_callback. The deduplication code for sources duplicated in scenes or across scenes is unchanged.

May 29 '25 23:05 pkviet

Update (May 30, 2025) Added profiling of the audio_callback in core audio processing (obs-audio.c). The impact of the PR is found to be negligible.

May 30 '25 16:05 pkviet

Note The earlier version of this PR relied on UI to detect when Monitoring and Desktop Audio use the same device. It might be still be a viable alternative to the latest version (although it doesn't cover Audio Output Capture). It is available in this branch of my fork: https://github.com/pkviet/obs-studio/tree/audio-fixes-back I can revert the PR to that earlier version if on review it is decided to be a better option.

Jun 08 '25 00:06 pkviet

Update I've split the PR because the two bugs it was fixing are conceptually different. This will make reviewing simpler. Also I've come back to the initial architecture with a UI part to detect whether an AOC has same device as monitoring. This makes the changes more minimal for the core audio pipeline. But if a pure libobs solution is preferred, the alternative solution can still be found here: https://github.com/pkviet/obs-studio/tree/audio-fixes_bak2

Jun 08 '25 18:06 pkviet

There's a whole lot of stuff happening in frontend code that I feel like it has no place there - the UI should be able to happily "do its thing" and it should be the monitoring system and/or the output system that are synching on this and prevent the duplication from happening.

I'll get into more detail on this later, but just to make sure I understand the expected outcome right:

If there is any source capturing the output of "Device A" and the same "Device A" is then added as a monitoring device, then the monitoring output for all of these capture sources is not supposed to be routed to the monitoring output (their contribution to the monitoring output is effectively "skipped")?

correct

Wouldn't it be sufficient if the monitoring source during set up checks the device ID associated with the source it is being attached to and become "inert" when it is the same ID (and repeat this check for any update to the source it's attached to, so when the captured device is changed, the monitoring source will notice and re-enable monitoring output again)?

yes, this is what is done in the PR. Since the monitoring device changes are UI driven, the PR tracks that in UI. It is possible to do that purely in libobs, since the monitoring_device_id is stored in obs_core::audio struct; but we would need to check for all audio sources in the audio tree whether they are AOC (audio output capture sources) and in case they are, whether their device coincides with the monitoring device. This must be done at each audio tick. As noted in the June 8 update to this PR, the pure libobs version of this PR does just that. The reason why I opted for the current UI version are in summary:

to make minimal changes to the core audio pipeline,
to limit the friction by avoiding a function which checks at each audio tick every audio source in the audio tree and verifies they're not an AOC with a device identical with the monitoring device. In practice from audio tick to audio tick, it will be very rare that a new AOC coincides with the mon device. So it feels natural to make such checks on the UI side. Nevertheless, it could be argued that such decisions belong to the backend (libobs) and not the UI, which should not have to know anything about such subtleties. So I am fine with switching to the pure libobs version if it so desired.

Aug 07 '25 21:08 pkviet

Update Following comments, I have reverted to the pure libobs version. Here's the updated profiling which shows that the audio callback is still very well within the 1024 audio frames for a tick (21 ms a 48 kHz) although it's 2-3 times longer than before:

5 sec recording: audio_thread(Audio): min=0.001 ms, median=0.047 ms, max=2.854 ms, 99th percentile=0.118 ms ┣audio_callback: min=0 ms, median=0.04 ms, max=0.307 ms, 99th percentile=0.098 ms ┗receive_audio: min=0.001 ms, median=0.018 ms, max=2.8 ms, 99th percentile=0.035 ms, 0.106341 calls per parent call ┣buffer_audio: min=0 ms, median=0 ms, max=0.003 ms, 99th percentile=0.003 ms ┗do_encode: min=0.001 ms, median=0.017 ms, max=2.799 ms, 99th percentile=0.034 ms ┣encode(Track1): min=0.001 ms, median=0.007 ms, max=0.033 ms, 99th percentile=0.013 ms ┗send_packet: min=0 ms, median=0.009 ms, max=2.79 ms, 99th percentile=0.026 ms
no output: audio_thread(Audio): min=0.002 ms, median=0.046 ms, max=1.184 ms, 99th percentile=0.107 ms ┗audio_callback: min=0 ms, median=0.039 ms, max=1.173 ms, 99th percentile=0.092 ms

The graphs shown in the PR description are identical with the updated pure libobs version of the pr and have been checked explicitly.

Aug 07 '25 22:08 pkviet

Update @RytoEX comments addressed; PR rebased on top of https://github.com/obsproject/obs-studio/pull/12268 So merge order must be:

12268
12175 (the current PR)
OR merge directly 12175 (I have added a link to the sister PR for easier ref in the top post of the current PR, which was the original one). Checked that the fixes still work after rebasing.

Aug 21 '25 06:08 pkviet

#12268 has been merged.

Aug 22 '25 20:08 RytoEX

#12268 has been merged.

rebased to master

Aug 22 '25 21:08 pkviet

note: no matter how many rebases on master i do, GH keeps saying : This branch is out-of-date with the base branch ...

Aug 22 '25 21:08 pkviet

note: no matter how many rebases on master i do, GH keeps saying : This branch is out-of-date with the base branch ...

If I recall correctly, it's a mistake on the GitHub UI. It will resolve once the PR is approved.

Aug 22 '25 21:08 RytoEX

most recent comments addressed and PR force pushed.

Aug 22 '25 21:08 pkviet