obs-studio
obs-studio copied to clipboard
libobs: Enable realtime mode for graphics thread on macOS
Description
Fix to inaccurate timer in the main graphics thread. There are likely different solutions that could apply here like reducing wake up time (sleep_time / 2) so it becomes more "spinny" but often the sleep time is in the 40-50ms off.
Master: 15.0728% within ±2% (43.6836% lower, 41.2436% higher)
info: obs_graphics_thread(16.6667 ms): min=0.205 ms, median=16.623 ms, max=63.61 ms, 15.0728% within ±2% of 16.667 ms (43.6836% lower, 41.2436% higher)
PR: 99.6557% within ±2% (0.176358% lower, 0.16796% higher)
info: obs_graphics_thread(16.6667 ms): min=0.078 ms, median=16.667 ms, max=60.686 ms, 99.6557% within ±2% of 16.667 ms (0.176358% lower, 0.16796% higher)
Master-branch timing:

Timeline of master branch (magnitude of each item is ms taken during each graphics loop iter)

PR-branch timing

Timeline of PR branch

Flamegraph of several iterations of PR branch

Motivation and Context
Stabilize thread timing to be more in line with Linux and Windows timings
How Has This Been Tested?
This has been profiled from a graphics thread performance standpoint but little real-life testing of how a realtime graphics thread interacts with games running on the system.
Types of changes
- Performance enhancement (non-breaking change which improves efficiency)
Checklist:
- [x] My code has been run through clang-format.
- [x] I have read the contributing document.
- [x] My code is not on the master branch.
- [x] The code has been tested.
- [x] All commit messages are properly formatted and commits squashed where appropriate.
- [x] I have included updates to all appropriate documentation.
- Use "macOS" when describing the current Apple operating system. It hasn't been called "OSX" in a while.
- Use an appropriate commit prefix (in this case, "libobs: ").
Fixed
If we're working with thread policies already, would it be possible to add correct QoS values so the graphics thread gets the scheduling priority it needs, especially on Apple Silicon systems?
EDIT: I haven't had time to go through the entire video, but this contains some good practices: https://developer.apple.com/videos/play/tech-talks/110147
Please don't abbreviate realtime to rt. Commit messages should be easily understood. "libobs: Enable realtime for graphics thread on macOS" is short enough.
@PatTheMav info: obs_graphics_thread(33.3333 ms): min=0.205 ms, median=33.334 ms, max=79.73 ms, 77.3852% within ±2% of 33.333 ms (10.689% lower, 11.9258% higher)
Adding QoS pthread_set_qos_class_self_np(QOS_CLASS_USER_INTERACTIVE, 0);
by itself does reduce the jitter somewhat. I'm unclear what the interaction between QoS and thread_policy are
Added QoS prio 47 for rendering (0 offset) and renamed the commit.
30fps info: obs_graphics_thread(33.3333 ms): min=0.186 ms, median=33.333 ms, max=78.494 ms, 99.5311% within ±2% of 33.333 ms (0.259131% lower, 0.209773% higher)
(Run for a longer period, 60fps) info: obs_graphics_thread(16.6667 ms): min=14.818 ms, median=16.667 ms, max=18.5 ms, 99.9939% within ±2% of 16.667 ms (0.00306429% lower, 0.00306429% higher)
These were the values as described by apple documentation but looking at the chromium source I would tend to agree.
Not that it matters much since you've already changed it, but I'm curious where you saw period=0 because the only Apple example I see has this:
ttcpolicy.period=period; // HZ/160
ttcpolicy.computation=computation; // HZ/3300;
ttcpolicy.constraint=constraint; // HZ/2200;
ttcpolicy.preemptible=1;
These were the values as described by apple documentation but looking at the chromium source I would tend to agree.
Not that it matters much since you've already changed it, but I'm curious where you saw period=0 because the only Apple example I see has this:
ttcpolicy.period=period; // HZ/160 ttcpolicy.computation=computation; // HZ/3300; ttcpolicy.constraint=constraint; // HZ/2200; ttcpolicy.preemptible=1;
https://developer.apple.com/library/archive/technotes/tn2169/_index.html
Clip from header
/* period: This is the nominal amount of time between separate
* processing arrivals, specified in absolute time units. A
* value of 0 indicates that there is no inherent periodicity in
* the computation. */
Which further reinforces your suggestion to provide that period since we are operating within consistent slices of time
Thanks, I think I get understand period=0. If the thread is woken to do stuff, it should only stay awake according to the other values before voluntarily giving itself up, and when it gets woken is not on an interval.
@kc5nra I looked into some available WWDC material on this:
- Add
pthread_attr_set_qos_class_np
withQOS_CLASS_USER_INTERACTIVE
to thephread
creation in https://github.com/obsproject/obs-studio/blob/167f53941643f570b5b01e13b34dfa8d7ab3a567/libobs/obs.c#L430 - This should make the timer code run better without any changes to
os_sleepto_ns
or other tweaks - Overall we should not use real time for the graphics thread, as this might also preempt all other work done on the machine, including the game/software being captured (so OBS would severely impact other apps on the system, as those should usually not use real time threading this way either).
I will still need to clarify some things, as the graphics thread is "abused" for other scheduling tasks that are not related to GPU work.
@kc5nra I looked into some available WWDC material on this::
- Add
pthread_attr_set_qos_class_np
withQOS_CLASS_USER_INTERACTIVE
to thephread
creation in https://github.com/obsproject/obs-studio/blob/167f53941643f570b5b01e13b34dfa8d7ab3a567/libobs/obs.c#L430- This should make the timer code run better without any changes to
os_sleepto_ns
or other tweaks- Overall we should not use real time for the graphics thread, as this might also preempt all other work done on the machine, including the game/software being captured (so OBS would severely impact other apps on the system, as those should usually not use real time threading this way either).
I will still need to clarify some things, as the graphics thread is "abused" for other scheduling tasks that are not related to GPU work.
@PatTheMav
I would say this is still not accurate enough, but does offer improvements
And for reference, I've mostly just been looking at sleep overshoot not contention that pushes past the 33ms/16ms boundary
info: obs_graphics_thread(16.6667 ms): min=0.207 ms, median=16.667 ms, max=196.966 ms, 67.9839% within ±2% of 16.667 ms (16.1638% lower, 15.8523% higher)

@kc5nra I discussed this PR internally and if you could update the PR to use the QoS attribute with pthread_create
instead, we could merge this soon-ish. Also feel free to retain the current change in a different branch so we could possibly revisit that solution later.