TheForceEngine icon indicating copy to clipboard operation
TheForceEngine copied to clipboard

Avoid busy waiting on the main and the midi threads.

Open MIvanchev opened this issue 1 year ago • 14 comments

Solves #402.

Hi, I would like to suggest the following changes that dramatically improve the CPU usage by avoiding busy waiting in the threads. For the main thread I suggest using SDL_WaitEventTimeout in the frame limiting which pauses the thread until events arrive because I believe it's important to process events immediately. So if your mouse is, say, 1200Hz you will not be limited to 60FPS.

For the MIDI thread I propose a command condition which also awakes the thread immediately. Otherwise the thread is paused until it's time to play another note.

For me this brings down CPU usage from 30% to 10% and the fan stays off the whole time.

MIvanchev avatar Apr 20 '24 09:04 MIvanchev

I have seen this and plan on going through it after the next release is out (any day now). Thanks for the suggestions and PR. :)

luciusDXL avatar Apr 25 '24 18:04 luciusDXL

Ooh, this is pretty great. With this applied, CPU usage went from 110% down to 6% with vsync enabled. @MIvanchev: it needs a rebase, but otherwise looks fine to me!

mlauss2 avatar May 17 '24 09:05 mlauss2

@mlauss2 Let me rebase, running your CPU at 110% long term is not a good idea!!!

MIvanchev avatar May 17 '24 19:05 MIvanchev

@mlauss2 I've rebased.

MIvanchev avatar May 18 '24 15:05 MIvanchev

I tried the frame limiter changes now as well. When not moving mouse/using keyboard, the framerate is about 5% off, when moving the character it goes up to 120, and the whole thing is really choppy when target framerate is set to 30fps. I suspect that using SDL_WaitEventTimeout() is an unsuitable wait function since it advances the game prematurely whenever input activity is detected, which the frame limiter in its current design wants to actually prevent.

I'd say drop the frame limiter changes for now; the midi thread parts are great and should go in asap as they make a world of difference wrt. cpu usage.

mlauss2 avatar May 31 '24 07:05 mlauss2

Thanks for all your testing!

When not moving mouse/using keyboard, the framerate is about 5% off, when moving the character it goes up to 120, and the whole thing is really choppy when target framerate is set to 30fps.

Yeah that limiter was not that good, let's kick it out.

MIvanchev avatar May 31 '24 07:05 MIvanchev

@MIvanchev please redo the patch with the frame limiter bits removed. Thanks!

mlauss2 avatar Jun 06 '24 08:06 mlauss2

I haven't forgotten @mlauss2, just need some time to do it :) Let me get to it ASAP.

MIvanchev avatar Jun 06 '24 10:06 MIvanchev

I've changed the code a bit, maybe you friends can check!

MIvanchev avatar Jul 13 '24 12:07 MIvanchev

gcc says:

midiPlayer.cpp:530:51: error: no matching function for call to 'max(<brace-enclosed initializer list>)'
  530 |                                 timeout = std::max({0, timeout});
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algobase.h:257:5: note:   candidate expects 2 arguments, 1 provided
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algobase.h:303:5: note:   candidate expects 3 arguments, 1 provided
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/algorithm:61,
                 from midiPlayer.cpp:15:
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:5732:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5732 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:5732:5: note:   template argument deduction/substitution failed:
midiPlayer.cpp:530:51: note:   deduced conflicting types for parameter '_Tp' ('int' and 'double')
  530 |                                 timeout = std::max({0, timeout});
      |                                           ~~~~~~~~^~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:5742:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5742 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:5742:5: note:   candidate expects 2 arguments, 1 provided

same for next line with std::min(). I think gcc expects a comparison function as 2nd parameter.

Why so complicated though, this here works well too, both clang and gcc are happy with it (no overflow/truncation warnings):

                                u64 sleeptime = (s_midiCallback.timeStep - s_midiCallback.accumulator) * 1000;
                                u32 timeout = sleeptime >= SDL_MUTEX_MAXWAIT ?: (u32)sleeptime;
                                SDL_CondWaitTimeout(s_midiCmdCond, s_mutex, timeout);

mlauss2 avatar Jul 13 '24 21:07 mlauss2

I'm not sure if infinities and negative values are undefined behavior in that case, otherwise a good solution unless someone changes the types of timeStep/accumulator. In general, C++ has facilities to make the code a little bit safer but they don't seem to be well supported 🤭

MIvanchev avatar Jul 14 '24 07:07 MIvanchev

OK the deduction should work now, it's hard to test for me RN cuz I'm on a very limited PC... it seems to work with GCC.

MIvanchev avatar Jul 14 '24 08:07 MIvanchev

OK the deduction should work now, it's hard to test for me RN cuz I'm on a very limited PC... it seems to work with GCC.

Yes both clang and gcc are now happy.

mlauss2 avatar Jul 15 '24 06:07 mlauss2

@luciusDXL please consider to include this PR into the next release. It makes a world of difference wrt. CPU usage.

mlauss2 avatar Jul 15 '24 21:07 mlauss2