fauxstream icon indicating copy to clipboard operation
fauxstream copied to clipboard

Gradually increasing video & audio desync over period of a stream

Open morgant opened this issue 1 year ago • 2 comments

A long-standing known issue that I've experienced is video & audio gradually becoming further out of sync over the duration of a stream. The longer the stream, the worse it is by the end. I've found this is less of an issue on faster hardware, but still exists.

Recently, I've been testing renice(8)-ing fauxstream, sndiod, and ffmpeg processes upon starting my stream to ensure they have the highest priority, e.g. renice -n -20 <pid>. Of course, this must be done as the superuser, which I don't want to run fauxstream itself as. Per the manual:

Users other than the superuser may only alter the priority of processes they own, and can only monotonically increase their “nice value” within the range 0 to PRIO_MAX (20), which prevents overriding administrative fiats. The superuser may alter the priority of any process and set the priority to any value in the range PRIO_MIN (-20) to PRIO_MAX.

This may help, but @rfht did some testing in a stream yesterday -- which I unfortunately missed -- and mentioned the following in #openbsd-gaming on Libera.chat:

it looks like using -max_interleave_delta 50000 helps with keeping things in sync

Per the FFmpeg Formats Documentation:

max_interleave_delta integer (output)

Set maximum buffering duration for interleaving. The duration is expressed in microseconds, and defaults to 10000000 (10 seconds).

To ensure all the streams are interleaved correctly, libavformat will wait until it has at least one packet for each stream before actually writing any packets to the output file. When some streams are "sparse" (i.e. there are large gaps between successive packets), this can result in excessive buffering.

This field specifies the maximum difference between the timestamps of the first and the last packet in the muxing queue, above which libavformat will output a packet regardless of whether it has queued a packet for all the streams.

If set to 0, libavformat will continue buffering packets until it has a packet for each stream, regardless of the maximum timestamp difference between the buffered packets.

So, I'll definitely give that a try too on my next stream.

morgant avatar Sep 26 '24 21:09 morgant

@rfht has been testing the following, which uses VA-API, plus some performance & desync improvements:

https://gist.github.com/rfht/153388d836bcc7235b516a4216421d79

morgant avatar Sep 27 '24 16:09 morgant

@rfht I recall you mentioning on #openbsd-gaming that you were still experiencing desync on your streams, even with this solution. It was quite some time ago and I haven't had a chance to test this implementation yet. 😦

I can't remember, do you override sndiod(8)'s -b option for lower-latency audio? For example, the OpenBSD FAQ's Multimedia section suggests 50ms for gaming.

Further thoughts, looking through the ffmpeg documentation again:

  • Maybe we could use loopback decoders to handle the video and both audio (monitor & mic) in a single ffmpeg command?
  • If we can handle it all in one command and remove piping entirely, would we then be able to utilize the -isync option on one of the streams (maybe video?) to help ffmpeg keep things synced?

I'll try to experiment with this soon.


Further note-to-self: I mentioned in this issue's description that I've been using renice(8). I wanted to note that sndiod(8) is already nice(1)-ed to -20, so any renice(8)-ing should probably use a lower value. obsdfreqd(1) uses -15, so I think a similar value (maybe even -10) would be appropriate for ffmpeg and/or fauxstream.

morgant avatar Jan 14 '25 20:01 morgant

@rfht asked me on IRC if my CBR (constant bit rate) PR #7 (for issue #6) had also resolved audio/video desync. I wanted to answer in more detail here since the answer is, unfortunately: no.

That said, it does seem to be greatly reduced, so is worth testing on better hardware.

In my case, a dual-core 2.2GHz (3.1GHz boost) i7 mobile processor (threading disabled, the default under OpenBSD) without hardware encoding, I am not surprised as any high CPU load can negatively affect the encoding. renice(8)-ing the fauxstream process group (to -15, since sndiod(8) is already renice(8)-ed to -20; so also its ffmpeg(1) child processes) helps significantly. As mentioned above, I'm doing this in a wrapper script (not shared here as it does a ton of other setup/teardown, but I'd be happy to write something more generic.)

I also disable automatic CPU scaling and lock it at the highest clock rate (though the processor will still "boost", if available) with apm -H.

Beyond that, I do want to note a couple things I learned in the process of adding further optimizations to my CBR PR #7:

  • The PR includes the following improvements, among others:
    • CBR encoding for both the video and audio streams
    • Keyframes every 2 seconds
    • Fixes to our use of x11grab to ensure that we're only capturing at the specified framerate, regardless of the display framerate. Example: my displays are only 50Hz (compared to the most common 60Hz) and my stream is only 25fps, so it's not trying to capture & encode 50Hz and then skip/drop frames for 25fps. NOTE: Newer displays, especially gaming displays, are often 144Hz and some are now available at upwards to 500Hz. It would be crazy to try to encode at that fps and only stream 1080p60 (1080p @ 60fps.)
    • Further tuning x264 encoding for zero-latency

With prioritizing fauxstream/ffmpeg processes and the above, that should help keep processor usage more consistent and reduce desync.

morgant avatar Aug 31 '25 22:08 morgant

As mentioned in a comment on Issue #10, I wonder if -thread_queue_size is actually causing some desync by forcing buffering instead of discarding input data which couldn't be processed quickly enough. Additionally, should we maybe do some maths as to how many threads are allowed for filters (e.g. -filter_threads/-complex_filter_threads) to ensure that our use of multiple ffmpeg processes and pipes doesn't try to allocate/schedule more threads than processors are available?

morgant avatar Oct 02 '25 19:10 morgant

I have done several tests today, including recording to a local file and a live stream to Twitch, and found that removing -thread_queue_size does seem to resolve desync for me.

Again, I'm running fairly low end hardware, so it's not terribly hard for me to launch an application that causes dropped frames. In the Twitch stream, I can see these temporary reductions in stream quality, but I still had zero visible de-sync by the end of the stream.

It wasn't much more than a 30 min Twitch stream, so I'll be doing some 2+ hr streams next week to further stress test.

I haven't created a PR yet as I'm currently testing these changes on top of patches to issue #10.

morgant avatar Oct 03 '25 19:10 morgant

I have now created PR #12 to address issue #10, which will need to be merged -- no rush! -- as a prerequisite for this issue.

morgant avatar Oct 05 '25 21:10 morgant

In my further attempted live stream tests of removing -thread_queue_size use entirely, I would get good bit rate & synced audio, but then -- after a few minutes and the first stutter due to high CPU load -- the frame rate & bit rate would drop significantly and not increase. A few minutes later, they would drop further, never recovering. My theory is that the thread queue was exhausted and so ffmpeg would just set a new, lower frame rate going forward instead of just dropping frames temporarily.

Worse, if I overloaded the CPU too much, the audio would still go out of sync.


I found a pair of 10yr old YouTube videos which cover various aspects of syncing audio in ffmpeg captures. They're long, but they helped me better understand the -thread_queue_size option as well as using filters to sync video & audio:

  • https://youtu.be/ynjFXmcpExE
  • https://youtu.be/Xbzh-T2kfJk

First, it seems very likely that our -thread_queue_size 512 is unnecessarily high. I decided to drop it to 64 (the YouTube videos said the default is 8, but I'm not sure if that's still the case or not) and only set it on the right-hand ffmpeg pipe which captures and encodes the video. (See issue #10 and my PR #12.)

Secondly, the YouTuber demonstrated using an audio sync advanced filter. At the time, the filter had to be compiled in and most platforms didn't seem to include it, but I found that similar functionality is included in the aresample filter in newer versions (more importantly, the OpenBSD port & packages!)

The most important of the Resampler options for our need is async:

async

For swr only, simple 1 parameter audio sync to timestamps using stretching, squeezing, filling and trimming. Setting this to 1 will enable filling and trimming, larger values represent the maximum amount in samples that the data may be stretched or squeezed for each second. Default value is 0, thus no compensation is applied to make the samples match the audio timestamps.

I figured, for real-time streams with as little latency and lower CPU load, we should start with async=1 (enable "filling & trimming" without stretching or squeezing).

It took a little experimentation, but I think I settled on a good combination of -thread_queue_size and aresample=async=1, specifically:

  • On the left-hand (before the pipe) ffmpeg command:
    • No -thread_queue_size
    • Apply aresample=async=1 to each of the audio channels being merged down to stereo (ensuring we're synchronizing the mic & monitor while merging), but don't encode to AAC (let the raw audio pass through the pipe via nut)
  • On the right-hand (after the pipe) ffmpeg command:
    • Set -thread_queue_size 64 on both the nut audio input and the x11grab input
    • Apply the aresample=async=1 via -filter_complex for the audio input via nut (ensuring we're synchronizing with the video capture)
    • Encode the audio to AAC (instead of copy) at the same time we're encoding the video (this is really necessary because you can't perform filters with copy, but also means that ffmpeg is ensuring the video & audio are synchronized in the output file/format)

I did some local testing and a 25+ minute live stream with these settings (putting together a PR now) and maintained my frame rate & bit rate, despite introducing high CPU load and even some CPU throttling. More importantly, audio stayed synchronized.

I'll do some more, longer, live stream tests with my latest patches.

morgant avatar Oct 14 '25 01:10 morgant