obs-studio icon indicating copy to clipboard operation
obs-studio copied to clipboard

libobs: Enable realtime mode for graphics thread on macOS

Open kc5nra opened this issue 2 years ago • 8 comments

Description

Fix to inaccurate timer in the main graphics thread. There are likely different solutions that could apply here like reducing wake up time (sleep_time / 2) so it becomes more "spinny" but often the sleep time is in the 40-50ms off.

Master: 15.0728% within ±2% (43.6836% lower, 41.2436% higher)

info: obs_graphics_thread(16.6667 ms): min=0.205 ms, median=16.623 ms, max=63.61 ms, 15.0728% within ±2% of 16.667 ms (43.6836% lower, 41.2436% higher)

PR: 99.6557% within ±2% (0.176358% lower, 0.16796% higher)

info: obs_graphics_thread(16.6667 ms): min=0.078 ms, median=16.667 ms, max=60.686 ms, 99.6557% within ±2% of 16.667 ms (0.176358% lower, 0.16796% higher)

Master-branch timing:

Screen Shot 2022-12-20 at 7 40 52 PM

Timeline of master branch (magnitude of each item is ms taken during each graphics loop iter)

Screen Shot 2022-12-20 at 7 42 03 PM

PR-branch timing

Screen Shot 2022-12-20 at 7 38 34 PM

Timeline of PR branch

Screen Shot 2022-12-20 at 7 43 15 PM

Flamegraph of several iterations of PR branch

Screen Shot 2022-12-20 at 7 43 35 PM

Motivation and Context

Stabilize thread timing to be more in line with Linux and Windows timings

How Has This Been Tested?

This has been profiled from a graphics thread performance standpoint but little real-life testing of how a realtime graphics thread interacts with games running on the system.

Types of changes

  • Performance enhancement (non-breaking change which improves efficiency)

Checklist:

  • [x] My code has been run through clang-format.
  • [x] I have read the contributing document.
  • [x] My code is not on the master branch.
  • [x] The code has been tested.
  • [x] All commit messages are properly formatted and commits squashed where appropriate.
  • [x] I have included updates to all appropriate documentation.

kc5nra avatar Dec 21 '22 02:12 kc5nra

  1. Use "macOS" when describing the current Apple operating system. It hasn't been called "OSX" in a while.
  2. Use an appropriate commit prefix (in this case, "libobs: ").

Fixed

kc5nra avatar Dec 21 '22 14:12 kc5nra

If we're working with thread policies already, would it be possible to add correct QoS values so the graphics thread gets the scheduling priority it needs, especially on Apple Silicon systems?

EDIT: I haven't had time to go through the entire video, but this contains some good practices: https://developer.apple.com/videos/play/tech-talks/110147

PatTheMav avatar Dec 21 '22 15:12 PatTheMav

@kc5nra libobs: Enable rt for graphics thread on macOS

Please don't abbreviate realtime to rt. Commit messages should be easily understood. "libobs: Enable realtime for graphics thread on macOS" is short enough.

RytoEX avatar Dec 21 '22 15:12 RytoEX

@PatTheMav info: obs_graphics_thread(33.3333 ms): min=0.205 ms, median=33.334 ms, max=79.73 ms, 77.3852% within ±2% of 33.333 ms (10.689% lower, 11.9258% higher) Adding QoS pthread_set_qos_class_self_np(QOS_CLASS_USER_INTERACTIVE, 0); by itself does reduce the jitter somewhat. I'm unclear what the interaction between QoS and thread_policy are

kc5nra avatar Dec 21 '22 15:12 kc5nra

Added QoS prio 47 for rendering (0 offset) and renamed the commit.

30fps info: obs_graphics_thread(33.3333 ms): min=0.186 ms, median=33.333 ms, max=78.494 ms, 99.5311% within ±2% of 33.333 ms (0.259131% lower, 0.209773% higher)

(Run for a longer period, 60fps) info: obs_graphics_thread(16.6667 ms): min=14.818 ms, median=16.667 ms, max=18.5 ms, 99.9939% within ±2% of 16.667 ms (0.00306429% lower, 0.00306429% higher)

kc5nra avatar Dec 21 '22 15:12 kc5nra

These were the values as described by apple documentation but looking at the chromium source I would tend to agree.

Not that it matters much since you've already changed it, but I'm curious where you saw period=0 because the only Apple example I see has this:

ttcpolicy.period=period; // HZ/160
ttcpolicy.computation=computation; // HZ/3300;
ttcpolicy.constraint=constraint; // HZ/2200;
ttcpolicy.preemptible=1;

jpark37 avatar Dec 21 '22 17:12 jpark37

These were the values as described by apple documentation but looking at the chromium source I would tend to agree.

Not that it matters much since you've already changed it, but I'm curious where you saw period=0 because the only Apple example I see has this:

ttcpolicy.period=period; // HZ/160
ttcpolicy.computation=computation; // HZ/3300;
ttcpolicy.constraint=constraint; // HZ/2200;
ttcpolicy.preemptible=1;

https://developer.apple.com/library/archive/technotes/tn2169/_index.html

Clip from header

 /* period: This is the nominal amount of time between separate
 * processing arrivals, specified in absolute time units.  A
 * value of 0 indicates that there is no inherent periodicity in
 * the computation. */

Which further reinforces your suggestion to provide that period since we are operating within consistent slices of time

kc5nra avatar Dec 21 '22 17:12 kc5nra

Thanks, I think I get understand period=0. If the thread is woken to do stuff, it should only stay awake according to the other values before voluntarily giving itself up, and when it gets woken is not on an interval.

jpark37 avatar Dec 21 '22 17:12 jpark37

@kc5nra I looked into some available WWDC material on this:

  • Add pthread_attr_set_qos_class_np with QOS_CLASS_USER_INTERACTIVE to the phread creation in https://github.com/obsproject/obs-studio/blob/167f53941643f570b5b01e13b34dfa8d7ab3a567/libobs/obs.c#L430
  • This should make the timer code run better without any changes to os_sleepto_ns or other tweaks
  • Overall we should not use real time for the graphics thread, as this might also preempt all other work done on the machine, including the game/software being captured (so OBS would severely impact other apps on the system, as those should usually not use real time threading this way either).

I will still need to clarify some things, as the graphics thread is "abused" for other scheduling tasks that are not related to GPU work.

PatTheMav avatar Jan 11 '23 16:01 PatTheMav

@kc5nra I looked into some available WWDC material on this::

  • Add pthread_attr_set_qos_class_np with QOS_CLASS_USER_INTERACTIVE to the phread creation in https://github.com/obsproject/obs-studio/blob/167f53941643f570b5b01e13b34dfa8d7ab3a567/libobs/obs.c#L430
  • This should make the timer code run better without any changes to os_sleepto_ns or other tweaks
  • Overall we should not use real time for the graphics thread, as this might also preempt all other work done on the machine, including the game/software being captured (so OBS would severely impact other apps on the system, as those should usually not use real time threading this way either).

I will still need to clarify some things, as the graphics thread is "abused" for other scheduling tasks that are not related to GPU work.

@PatTheMav

I would say this is still not accurate enough, but does offer improvements

And for reference, I've mostly just been looking at sleep overshoot not contention that pushes past the 33ms/16ms boundary

info: obs_graphics_thread(16.6667 ms): min=0.207 ms, median=16.667 ms, max=196.966 ms, 67.9839% within ±2% of 16.667 ms (16.1638% lower, 15.8523% higher)

Screen Shot 2023-01-12 at 8 04 30 AM

kc5nra avatar Jan 12 '23 14:01 kc5nra

@kc5nra I discussed this PR internally and if you could update the PR to use the QoS attribute with pthread_create instead, we could merge this soon-ish. Also feel free to retain the current change in a different branch so we could possibly revisit that solution later.

PatTheMav avatar Feb 18 '23 19:02 PatTheMav