obs-studio
obs-studio copied to clipboard
libobs: reduce synchronization limit for multiple audio tracks
Description
Since 65eb3c0815929866dde7d398bec7738d21e9b211 we tried to get as close a sync between audio and video tracks as we could before starting to send frames to the output.
But in be717dbb2c19aa0e91a164fa52e3d4467b64be9d when multi-track audio was considered for synchronization it continued to try and line ALL audio packets up with one video packet. While audio and video packets are similar size this will work out. But once video packets duration is smaller than 1/2 audio packet duration this may fail forever. In practice with AAC's 20-23ms frame duration we can often get tracks producing frames 5-10ms out of phase. For 60fps video this mostly fine as 16ms frame duration will cover the gap, but for framerates as low as 100 its possible to fail to synchronize. In practice this is ~50% for 6 audio tracks and 120fps video on my system, or 100% at 240fps.
Since this doesnt change the logic for single audio tracks I think it's relatively safe but you can never be too safe around audio sync issues.
Motivation and Context
No one likes when OBS eats a recording. Largely mitigates #2855 but by just picking even the worst possible starting frame. Ideally we could reduce the phase difference in the audio encoders and continue using the old logic. This also doesnt solve the issue with us marking the recording as started despite it not actually outputting any frames while waiting for the sync point.
How Has This Been Tested?
I started and stopped a lot of recordings with a fast hardware encoder and 6 audio tracks enabled. If the video encoder is too slow this may not occur as I couldnt get it to happen with software encoders.
Types of changes
- Bug fix (non-breaking change which fixes an issue)
Checklist:
- [x] My code has been run through clang-format.
- [x] I have read the contributing document.
- [x] My code is not on the master branch.
- [x] The code has been tested.
- [x] All commit messages are properly formatted and commits squashed where appropriate.
- [x] I have included updates to all appropriate documentation.
And an example bad audio encoder frame generation series is https://gist.github.com/kkartaltepe/d9bc973b03eda915829d4df10f289439
At the first attempted sync (Pruning! 0
) we have
debug: packet: audio 1, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 2, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 3, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 4, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 5, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 0, ts: 149505185226, pruned = false, size = 524
debug: packet: video 0, ts: 149505187726, pruned = false, size = 25586
Here it seems the packets are out of phase by 21.281ms which is basically the worst case but seems fairly common in my testing at least. And we choose this audio/video packet as our possible sync point and begin pruning packets.
later after some more audio packets we end up with another possible sync point at
debug: packet: audio 1, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 2, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 3, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 4, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 5, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 0, ts: 149505185226, pruned = false, size = 524
debug: packet: video 0, ts: 149505187726, pruned = false, size = 25586
debug: packet: video 0, ts: 149505191892, pruned = false, size = 2164
And we see the phase issue, audio encoders 1-5 are all in sync but audio 0 is out of phase by 1.281ms and audio 0 and video 0 are out sync by 2.500ms. This one might work if we were smart but we clear the interleaving array up to the audio 0's frame since its the closest: https://github.com/obsproject/obs-studio/blob/61284cf9bafc3d2c9e2c533c436f8fd38c6d187e/libobs/obs-output.c#L1361
removing the earlier audio 1/2/3/4/5 that were within our hopeful 4ms window. So this sync point fails and we need to wait for the next samples in audio 1/2/3/4/5, which we saw earlier was 21ms out of phase and thus we never find a valid <4ms (for 240fps) sync point. (see pruning occurring at Pruning! 10
and Pruning! 11
)
If we wanted to make the bigger change to try and catch this better sync point where the order of audio 0/1/2/3/4/5 might be different that would good too but likely a bit more complicated than I care to do within the audio sync system.
Tested on win10, 240fps, 6 audio tracks. Normally the recording will die very quickly with this setup. I did 1 recording attempt with lots of render lag, and it did not die. I then did 4 more recordings, up to 5 min long, and none of them died.
This is looking very promising in so far as that problem goes (muxer/encoder hang, or whatever it is). On a cursory glance, audio sync is looking no worse for wear.
Frame duration comes from 1024 samples for each frame; 1024 / 48kHz = 21.3 ms or 1024 / 44.1kHz = 23.2 ms.
Did you intentionally put the magic number 1024 * 21
?
Nitpik on the commit name Reduce
with a uppercase R
.
Frame duration comes from 1024 samples for each frame; 1024 / 48kHz = 21.3 ms or 1024 / 44.1kHz = 23.2 ms. Did you intentionally put the magic number
1024 * 21
?
Yes the AAC_FRAME_DURATION_USEC is in microseconds, so the 21 comes from 1/48kHz=20.8333us. It possible this is too short for 44.1kHz Ill test and see maybe we need to be more conservative.
It possible this is too short for 44.1kHz Ill test and see maybe we need to be more conservative.
Instead of having a hard coded constant, cann't we use 1000000 / output->audio->info->samples_per_sec
?
Instead of having a hard coded constant, cann't we use 1000000 / output->audio->info->samples_per_sec?
We could but this is a per encoder thing, hardcoding (hopefully) makes it clear its not really sustainable if we change the way we do multiple encoders. We can instead query every encoder and pick the largest period if we are ok with a larger change.
We can instead query every encoder and pick the largest period if we are ok with a larger change.
I misunderstood the motivation of this PR. Sounds like current hard-code implementation is sufficient enough.
Is it really a good idea to hardcode with an AAC value when in the future (while ignoring FTL) using an Opus encoder will be possible for recording, SRT and RIST ?
Is it really a good idea to hardcode with an AAC value when in the future (while ignoring FTL) using an Opus encoder will be possible for recording, SRT and RIST ?
Opus uses a frame size of 20ms by default, which is probably close enough for now.
Is it really a good idea to hardcode with an AAC value when in the future (while ignoring FTL) using an Opus encoder will be possible for recording, SRT and RIST ?
None of these are supported, if/when we do support multiple audio codecs in the future when we do that would be a good time to revisit this code. But also AAC has the largest frame size of currently available codecs at 1024 samples. Opus is only 960 samples at it default period.
I tested at 240fps with 6 audio tracks and as mentioned a few messages above, this would normally be a guaranteed way to crash the encoder/muxer on my system. I was able to let the recording run for about an hour to test with no issues and no crashes.
It possible this is too short for 44.1kHz Ill test and see maybe we need to be more conservative.
Instead of having a hard coded constant, cann't we use
1000000 / output->audio->info->samples_per_sec
?
I have updated it to grab the actual max duration from the audio encoders instead of hardcoding AAC's framesize and 48khz.
Would there be any issues with enabling 96 kHz at some future date with the new sync logic?
Would there be any issues with enabling 96 kHz at some future date with the new sync logic?
The current version is no longer hardcoded this should be fine for whatever (baring numerical/precision issues).
--- edit As authored I dont expect any numerical issues for regular audio frequencies.