realm-core icon indicating copy to clipboard operation
realm-core copied to clipboard

High CPU usage on Windows

Open blagoev opened this issue 2 years ago • 3 comments

We are witnessing a high CPU usage on Windows for some time now. The problem was first noticed on our CI which runs on GH Actions and two core VMs. We managed to reproduce it locally by setting an affinity of the process to only two cores (simulating the environment on GH Actions) and this is the profile information we managed to capture.

This seems the offending function

Function Name	Total CPU [unit, %]	Self CPU [unit, %]	Module
| - realm::util::network::Service::IoReactor::wait_and_advance	35143 (97.53%)	7557 (20.97%)	realm_dart

the code with highest CPU usage is

            } while (ret == 0 &&
                     (duration_cast<milliseconds>(steady_clock::now() - started).count() < max_wait_millis));

This takes 89% of the execution time.

Here a screenshot of the profiling session image

Note that because of an debug assertion on Windows the SDK is actually paused and not running. But these background threads are continuing to pump the CPU. Here is another screenshot using ProcessExplorer image

The code is prefixed with this comment which shows that we have a special path for Windows in that function wait_and_advance

// Windows does not have a single API call to wait for pipes and // sockets with a timeout. So we repeatedly poll them individually // in a loop until max_wait_millis has elapsed or an event happend. // // FIXME: Maybe switch to Windows IOCP instead.

// Following variable is the poll time for the sockets in // miliseconds. Adjust it to find a balance between CPU usage and // response time:

We think this is of high priority since it taxes the CPU to 100% and makes any Realm Sync operation run really slow.

EDIT: This is witnessed on Windows

blagoev avatar Jun 15 '22 14:06 blagoev

Note that if this turns out to be difficult to fix, we can also just prioritize platform networking for data and get rid of our custom networking implementation altogether.

nirinchev avatar Jun 15 '22 16:06 nirinchev

➤ Jonathan Reams commented:

I think fixing this issue is an unknown amount of work right now. Just "switching to Windows IOCP" would likely solve this issue, but would also be a substantial re-write of how we do networking on windows. We could also pull in ASIO - which our networking library is somewhat based on - which has actually good IOCP support and use that as an alternate networking impl on windows. I have no good estimates what that would do for binary size or performance, but it won't peg any CPUs - at least not for the same reasons. We could also just bump up the priority of platform networking projects on platforms that support windows.

Regardless of what we do, I think this is likely a forever bug that's not going to be solvable within the next month or two without re-shuffling some other priorities.

sync-by-unito[bot] avatar Jun 15 '22 21:06 sync-by-unito[bot]

➤ James Stone commented:

We have merged a mitigation in https://github.com/realm/realm-core/pull/5594. I am reducing the priority of this, but we can keep it open until we have a better long term solution.

sync-by-unito[bot] avatar Jun 21 '22 00:06 sync-by-unito[bot]

➤ marysiapietraszewska commented:

Will be fixed by introducing platform networking

sync-by-unito[bot] avatar Apr 18 '23 13:04 sync-by-unito[bot]