libobs: Fix audio duplication bug
Description
This is split from the original PR https://github.com/obsproject/obs-studio/pull/12175 to facilitate the reviewing. PR #12175 fixed two audio bugs but the fixes are conceptually different.
The current bug consists in an increased audio level when a source is duplicated in a scene or through a nested scene. The bug implies that the same audio samples will be mixed several times resulting in a coherent addition. For duplicate sources this implies a doubling of the audio amplitude or a constant +6 dBFS level increase.
The bug was mostly fixed in commits 8a38e33 [1] and 72924ac [2]. But it was fixed at the scene level and this left an edge case where a nested scene has a Show/Hide transition. It implies an increase of +6 dBFS for any duplicated source. An example of valid use case is a Media Source or Browser source which is duplicated and cropped to different areas. Or a nested scene. We want to be able to transition from scene A to scene B with nested scene A, without any change in audio level for instance, if the same audio is used across scenes.
[1] libobs: Mix audio of each source in a scene only once [2] libobs: Deduplicate audio for nested scenes/groups if not transitioning
Motivation and Context
We fix the bug in this manner:
- at each audio tick, when the audio tree is built, we tag duplicate audio sources;
- they are then bypassed in the audio rendering of any scenes or transitions they're involved;
- they're mixed in the final mix as root_nodes exactly like global audio sources. This intrinsically prevents any duplication within a scene, across scenes and in transitions. This is a better solution than the previous one because it is not limited to intra-scene deduplication. This fixes for instance the edge case of Show/Hide transition to a nested scene.
How Has This Been Tested?
This has been tested extensively on windows 11 pro 23H2 by myself using:
- a sine tone at 1000 Hz, generated by lavfi filter in Media Source, of amplitude 1/8, which means -21.1 dBFS.
- the music from a splatoon clip. The audio level of the sine was checked using the FFmpeg volumedetect filter: ffmpeg -i '.\input.mp4' -filter:a volumedetect -f null /dev/null Recordings were made in mp4 container with 16 bit pcm raw audio. The waveforms were checked and compared between OBS 31.0.3 (current release) and this PR in Reaper DAW.
@Warchamp7 did also extensive tests which helped me dodge a few dangerous bullets (crashes and what not). Thanks a lot for his help !
Profiling results
We found negligible impact of the current PR on the audio_callback.
- no output( no recording nor streaming: obs master:
audio_thread(Audio): min=0.002 ms, median=0.018 ms, max=0.422 ms, 99th percentile=0.09 ms
┗audio_callback: min=0 ms, median=0.012 ms, max=0.411 ms, 99th percentile=0.08 ms
- this PR:
audio_thread(Audio): min=0.001 ms, median=0.021 ms, max=0.074 ms, 99th percentile=0.043 ms
┗audio_callback: min=0 ms, median=0.016 ms, max=0.065 ms, 99th percentile=0.032 ms
- after a 5 seconds recording:
- obs master:
audio_thread(Audio): min=0.001 ms, median=0.031 ms, max=0.248 ms, 99th percentile=0.073 ms
┣audio_callback: min=0 ms, median=0.016 ms, max=0.11 ms, 99th percentile=0.04 ms
┗receive_audio: min=0.001 ms, median=0.016 ms, max=0.224 ms, 99th percentile=0.044 ms, 0.428969 calls per parent call
┣buffer_audio: min=0 ms, median=0 ms, max=0.038 ms, 99th percentile=0.003 ms
┗do_encode: min=0.002 ms, median=0.015 ms, max=0.224 ms, 99th percentile=0.043 ms
┣encode(Track1): min=0.001 ms, median=0.005 ms, max=0.033 ms, 99th percentile=0.01 ms
- this PR:
audio_thread(Audio): min=0.002 ms, median=0.028 ms, max=0.35 ms, 99th percentile=0.071 ms
┣audio_callback: min=0 ms, median=0.015 ms, max=0.199 ms, 99th percentile=0.03 ms
┗receive_audio: min=0.001 ms, median=0.015 ms, max=0.136 ms, 99th percentile=0.043 ms, 0.40137 calls per parent call
┣buffer_audio: min=0 ms, median=0 ms, max=0.004 ms, 99th percentile=0.002 ms
┗do_encode: min=0.002 ms, median=0.014 ms, max=0.135 ms, 99th percentile=0.042 ms
┣encode(Track1): min=0.001 ms, median=0.005 ms, max=0.032 ms, 99th percentile=0.012 ms
┗send_packet: min=0 ms, median=0.009 ms, max=0.129 ms, 99th percentile=0.034 ms
Tests
the following tests were done:
-
scene with two sines,
-
show/hide transition for a sine in a scene with duplicated sine,
-
fade transition between a scene A with single sine to scene B w/ duplicated sine,
-
scene A with sine, nested in scene B which also has the sine as a source.
-
same as 4 with a show/hide transition on the sine in scene B (not shown, same result as 4.)
-
same tests on a audio clip (splatoon): identical results. There is no regression on this PR.
-
same as 4. w/ a show/hide transition on the nested scene A:
- left: current release: there is a bump at the transition; level does not stay nominal;
- right: this PR: no bump, level stays nominal.
Additional tests:
- Fade transition (300 ms) between scene A with a 10 kHz sine at -21.12 dBFS to scene B with a 480 Hz sine at -30 dBFs.
- top: current release,
- bottom: this PR. No difference.
Types of changes
- Bug fix (non-breaking change which fixes an issue)
Checklist:
- [x] My code has been run through clang-format.
- [x] I have read the contributing document.
- [x] My code is not on the master branch.
- [x] The code has been tested.
- [x] All commit messages are properly formatted and commits squashed where appropriate.
- [x] I have included updates to all appropriate documentation.
As noted on this and the previous PR, these are issues I've wanted fixed very badly for a long time and have gotten a fair bit of testing from myself on Windows 10.
These are deep audio pipeline changes so more eyes and testing are generally appreciated.
Just for stupid people like me: The change here effectively creates a "global audio graph" of all sources that are currently providing audio samples, deduplicated across the entire source graph (including scenes involved in transitions and nested scenes within those).
So if I have (for whatever reason) and audio capture source in scene A through C, and scene B contains scene C as a "nested scene", and I transition from A to B, the audio capture source will only be present once in the audio graph during a transition, thus preventing the amplification?
Kindasorta as if all audio sources only exist once per audio bus?
Just for stupid people like me: The change here effectively creates a "global audio graph" of all sources that are currently providing audio samples, deduplicated across the entire source graph (including scenes involved in transitions and nested scenes within those).
So if I have (for whatever reason) and audio capture source in scene A through C, and scene B contains scene C as a "nested scene", and I transition from A to B, the audio capture source will only be present once in the audio graph during a transition, thus preventing the amplification?
Kindasorta as if all audio sources only exist once per audio bus?
exactly; this is a general fix which should work also for the dsk plugin which relies on a global source and not a nested scene.
Just for stupid people like me: The change here effectively creates a "global audio graph" of all sources that are currently providing audio samples, deduplicated across the entire source graph (including scenes involved in transitions and nested scenes within those). So if I have (for whatever reason) and audio capture source in scene A through C, and scene B contains scene C as a "nested scene", and I transition from A to B, the audio capture source will only be present once in the audio graph during a transition, thus preventing the amplification? Kindasorta as if all audio sources only exist once per audio bus?
exactly; this is a general fix which should work also for the dsk plugin which relies on a global source and not a nested scene.
In my hypothetical example, the nested audio from scene C inside scene B might potentially be "faded in" as part of the transition from A to B (and the audio from scene A is "faded out". Is that resolved, because effectively no fade should actually take place, but are any volume changes from transitions simply ignored the moment an audio source is deduplicated?
Just for stupid people like me: The change here effectively creates a "global audio graph" of all sources that are currently providing audio samples, deduplicated across the entire source graph (including scenes involved in transitions and nested scenes within those). So if I have (for whatever reason) and audio capture source in scene A through C, and scene B contains scene C as a "nested scene", and I transition from A to B, the audio capture source will only be present once in the audio graph during a transition, thus preventing the amplification? Kindasorta as if all audio sources only exist once per audio bus?
exactly; this is a general fix which should work also for the dsk plugin which relies on a global source and not a nested scene.
In my hypothetical example, the nested audio from scene C inside scene B might potentially be "faded in" as part of the transition from A to B (and the audio from scene A is "faded out". Is that resolved, because effectively no fade should actually take place, but are any volume changes from transitions simply ignored the moment an audio source is deduplicated?
the fade does take place, but the audio of the duplicated source is excluded from the fade audio render although it is still in the final audio mix; its level stays constant across the scenes and the transitions. The fade volume will still apply during the transition to non duplicated sources though.
It seems that both this and #12175 add a push_audio_tree2()? Is there a canonical/preferred order for addressing/merging these?
No preferred order. But i will have to rebase one over the other. I made them independent for the reviewing. I propose that the current one be used as basis for the other.
No preferred order. But i will have to rebase one over the other. I made them independent for the reviewing. I propose that the current one be used as basis for the other.
I asked because the comments for each push_audio_tree2() seemed to be different, so I was concerned that there are other conflicting changes and which change is "correct".
No preferred order. But i will have to rebase one over the other. I made them independent for the reviewing. I propose that the current one be used as basis for the other.
I asked because the comments for each
push_audio_tree2()seemed to be different, so I was concerned that there are other conflicting changes and which change is "correct".
as indicated in #12175 , I rebased the latter to have the contents of both again. So this one can be merged first and then 12175; or 12175 can be merged since it now has again the two features.