obs-studio icon indicating copy to clipboard operation
obs-studio copied to clipboard

libobs: reduce synchronization limit for multiple audio tracks

Open kkartaltepe opened this issue 2 years ago • 8 comments

Description

Since 65eb3c0815929866dde7d398bec7738d21e9b211 we tried to get as close a sync between audio and video tracks as we could before starting to send frames to the output.

But in be717dbb2c19aa0e91a164fa52e3d4467b64be9d when multi-track audio was considered for synchronization it continued to try and line ALL audio packets up with one video packet. While audio and video packets are similar size this will work out. But once video packets duration is smaller than 1/2 audio packet duration this may fail forever. In practice with AAC's 20-23ms frame duration we can often get tracks producing frames 5-10ms out of phase. For 60fps video this mostly fine as 16ms frame duration will cover the gap, but for framerates as low as 100 its possible to fail to synchronize. In practice this is ~50% for 6 audio tracks and 120fps video on my system, or 100% at 240fps.

Since this doesnt change the logic for single audio tracks I think it's relatively safe but you can never be too safe around audio sync issues.

Motivation and Context

No one likes when OBS eats a recording. Largely mitigates #2855 but by just picking even the worst possible starting frame. Ideally we could reduce the phase difference in the audio encoders and continue using the old logic. This also doesnt solve the issue with us marking the recording as started despite it not actually outputting any frames while waiting for the sync point.

How Has This Been Tested?

I started and stopped a lot of recordings with a fast hardware encoder and 6 audio tracks enabled. If the video encoder is too slow this may not occur as I couldnt get it to happen with software encoders.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • [x] My code has been run through clang-format.
  • [x] I have read the contributing document.
  • [x] My code is not on the master branch.
  • [x] The code has been tested.
  • [x] All commit messages are properly formatted and commits squashed where appropriate.
  • [x] I have included updates to all appropriate documentation.

kkartaltepe avatar Jan 30 '23 03:01 kkartaltepe

And an example bad audio encoder frame generation series is https://gist.github.com/kkartaltepe/d9bc973b03eda915829d4df10f289439

At the first attempted sync (Pruning! 0) we have

debug: packet: audio 1, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 2, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 3, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 4, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 5, ts: 149505163945, pruned = false, size = 3
debug: packet: audio 0, ts: 149505185226, pruned = false, size = 524
debug: packet: video 0, ts: 149505187726, pruned = false, size = 25586

Here it seems the packets are out of phase by 21.281ms which is basically the worst case but seems fairly common in my testing at least. And we choose this audio/video packet as our possible sync point and begin pruning packets.

later after some more audio packets we end up with another possible sync point at

debug: packet: audio 1, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 2, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 3, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 4, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 5, ts: 149505183945, pruned = false, size = 3
debug: packet: audio 0, ts: 149505185226, pruned = false, size = 524
debug: packet: video 0, ts: 149505187726, pruned = false, size = 25586
debug: packet: video 0, ts: 149505191892, pruned = false, size = 2164

And we see the phase issue, audio encoders 1-5 are all in sync but audio 0 is out of phase by 1.281ms and audio 0 and video 0 are out sync by 2.500ms. This one might work if we were smart but we clear the interleaving array up to the audio 0's frame since its the closest: https://github.com/obsproject/obs-studio/blob/61284cf9bafc3d2c9e2c533c436f8fd38c6d187e/libobs/obs-output.c#L1361 removing the earlier audio 1/2/3/4/5 that were within our hopeful 4ms window. So this sync point fails and we need to wait for the next samples in audio 1/2/3/4/5, which we saw earlier was 21ms out of phase and thus we never find a valid <4ms (for 240fps) sync point. (see pruning occurring at Pruning! 10 and Pruning! 11)

If we wanted to make the bigger change to try and catch this better sync point where the order of audio 0/1/2/3/4/5 might be different that would good too but likely a bit more complicated than I care to do within the audio sync system.

kkartaltepe avatar Jan 30 '23 03:01 kkartaltepe

Tested on win10, 240fps, 6 audio tracks. Normally the recording will die very quickly with this setup. I did 1 recording attempt with lots of render lag, and it did not die. I then did 4 more recordings, up to 5 min long, and none of them died.

This is looking very promising in so far as that problem goes (muxer/encoder hang, or whatever it is). On a cursory glance, audio sync is looking no worse for wear.

flaeri avatar Jan 30 '23 08:01 flaeri

Frame duration comes from 1024 samples for each frame; 1024 / 48kHz = 21.3 ms or 1024 / 44.1kHz = 23.2 ms. Did you intentionally put the magic number 1024 * 21?

norihiro avatar Feb 04 '23 23:02 norihiro

Nitpik on the commit name Reduce with a uppercase R.

tytan652 avatar Feb 04 '23 23:02 tytan652

Frame duration comes from 1024 samples for each frame; 1024 / 48kHz = 21.3 ms or 1024 / 44.1kHz = 23.2 ms. Did you intentionally put the magic number 1024 * 21?

Yes the AAC_FRAME_DURATION_USEC is in microseconds, so the 21 comes from 1/48kHz=20.8333us. It possible this is too short for 44.1kHz Ill test and see maybe we need to be more conservative.

kkartaltepe avatar Feb 05 '23 00:02 kkartaltepe

It possible this is too short for 44.1kHz Ill test and see maybe we need to be more conservative.

Instead of having a hard coded constant, cann't we use 1000000 / output->audio->info->samples_per_sec?

norihiro avatar Feb 05 '23 00:02 norihiro

Instead of having a hard coded constant, cann't we use 1000000 / output->audio->info->samples_per_sec?

We could but this is a per encoder thing, hardcoding (hopefully) makes it clear its not really sustainable if we change the way we do multiple encoders. We can instead query every encoder and pick the largest period if we are ok with a larger change.

kkartaltepe avatar Feb 05 '23 01:02 kkartaltepe

We can instead query every encoder and pick the largest period if we are ok with a larger change.

I misunderstood the motivation of this PR. Sounds like current hard-code implementation is sufficient enough.

norihiro avatar Feb 06 '23 11:02 norihiro

Is it really a good idea to hardcode with an AAC value when in the future (while ignoring FTL) using an Opus encoder will be possible for recording, SRT and RIST ?

tytan652 avatar Feb 15 '23 18:02 tytan652

Is it really a good idea to hardcode with an AAC value when in the future (while ignoring FTL) using an Opus encoder will be possible for recording, SRT and RIST ?

Opus uses a frame size of 20ms by default, which is probably close enough for now.

derrod avatar Feb 15 '23 19:02 derrod

Is it really a good idea to hardcode with an AAC value when in the future (while ignoring FTL) using an Opus encoder will be possible for recording, SRT and RIST ?

None of these are supported, if/when we do support multiple audio codecs in the future when we do that would be a good time to revisit this code. But also AAC has the largest frame size of currently available codecs at 1024 samples. Opus is only 960 samples at it default period.

kkartaltepe avatar Feb 16 '23 01:02 kkartaltepe

I tested at 240fps with 6 audio tracks and as mentioned a few messages above, this would normally be a guaranteed way to crash the encoder/muxer on my system. I was able to let the recording run for about an hour to test with no issues and no crashes.

prgmitchell avatar Feb 25 '23 07:02 prgmitchell

It possible this is too short for 44.1kHz Ill test and see maybe we need to be more conservative.

Instead of having a hard coded constant, cann't we use 1000000 / output->audio->info->samples_per_sec?

I have updated it to grab the actual max duration from the audio encoders instead of hardcoding AAC's framesize and 48khz.

kkartaltepe avatar Mar 03 '23 00:03 kkartaltepe

Would there be any issues with enabling 96 kHz at some future date with the new sync logic?

energizerfellow avatar Mar 16 '23 02:03 energizerfellow

Would there be any issues with enabling 96 kHz at some future date with the new sync logic?

The current version is no longer hardcoded this should be fine for whatever (baring numerical/precision issues).

--- edit As authored I dont expect any numerical issues for regular audio frequencies.

kkartaltepe avatar Mar 16 '23 03:03 kkartaltepe