Avalonia icon indicating copy to clipboard operation
Avalonia copied to clipboard

Task.Delay takes too long on macOS

Open Stannieman opened this issue 1 year ago • 18 comments

Describe the bug In an Avalonia macOS desktop app Task.Delay often takes too long to return. In a simple non-Avalonia console app the issue does not occur.

To Reproduce Steps to reproduce the behavior:

  1. Create a minimal Avalonia app from the VS project template.
  2. Somewhere (e.g. MainWindow) put Task.Delay in an infinite loop.
  3. Often Task.Delay takes way longer than configured.

Expected behavior Task.Delay takes about as long as configured.

Desktop (please complete the following information):

  • OS: Mac
  • Version: 12.5

Additional context Repro: https://github.com/Stannieman/AvaloniaDelayTest This loops a Task.Delay of 250ms. When delay takes longer than 500ms it prints the actual duration to the terminal.

After only a couple of minutes the output looks like this:

Too long: 00:00:10.1278680
Too long: 00:00:01.4919020
Too long: 00:00:10.1362750
Too long: 00:00:10.0013450
Too long: 00:00:10.0012170
Too long: 00:00:10.0014200
Too long: 00:00:10.0012190
Too long: 00:00:10.0014000
Too long: 00:00:10.0014770
Too long: 00:00:10.0012340
Too long: 00:00:10.0298920
Too long: 00:00:09.9944410
Too long: 00:00:10.0013260
Too long: 00:00:10.0011620
Too long: 00:00:10.0011870
Too long: 00:00:10.0100230
Too long: 00:00:10.0057740
Too long: 00:00:10.0125380
Too long: 00:00:10.0072980
Too long: 00:00:10.0296070
Too long: 00:00:09.9992390
Too long: 00:00:10.0023670
Too long: 00:00:10.0020110
Too long: 00:00:10.0019850
Too long: 00:00:10.0063740
Too long: 00:00:09.9987900
Too long: 00:00:10.0016140
Too long: 00:00:01.4810180
Too long: 00:00:10.0016090
Too long: 00:00:20.0312540
Too long: 00:00:19.9925110
Too long: 00:00:20.0018100
Too long: 00:00:20.0017170
Too long: 00:00:20.0116750
Too long: 00:00:19.9987460
Too long: 00:00:20.0019690
Too long: 00:00:20.0016450
Too long: 00:00:20.0018110
Too long: 00:00:03.6547710
Too long: 00:00:02.4756200
Too long: 00:00:02.5731120
Too long: 00:00:03.6644760

Since Task.Delay behind the scenes uses the same timer mechanism as most if not all other .NET timer APIs this is quite concerning.

Is Avalonia messing with doing something thread pool/scheduling stuff somewhere? The machine is idle when this occurs. Memory is not full, no CPU spikes. It just "happens".

Stannieman avatar Sep 02 '22 21:09 Stannieman

Are you interacting with the application window? If not, it might be macOS power saving feature that puts the app to sleep (with occasional wakeups) if it has no user interaction.

kekekeks avatar Sep 03 '22 05:09 kekekeks

No, not interacting with it. I thought about this as well but it seems so strange. If macOS does this to any app, how can software in general even work on this OS?

Stannieman avatar Sep 03 '22 06:09 Stannieman

I have also no idea how this app suspending works. https://developer.apple.com/documentation/appkit/nsapplicationdelegate

I'd expect some methods related to suspending and resuming here like there are on iOS, but there aren't any. If the OS suspends an app surely the app should have a way of knowing this is happening, right?

Stannieman avatar Sep 03 '22 08:09 Stannieman

I can confirm this issue. My app MQTTnet app shows the same behavior. The connection gets dropped due to keep alive timing issues.

I had this issue 1 time on a windows machine. Maybe by accident but it may be related.

@Stannieman Shows the console app the same issue on macOS when being idle? If power saving is the root cause I would expect to see the same behavior when testing in a console application.

chkr1011 avatar Sep 03 '22 08:09 chkr1011

@chkr1011 no problem with console app so far. But that does not necessarily mean much. Console apps are different because there are no windows and thus no such thing as idle. Typically you never want them to suspend because you gave them a job and they should keep running full blast until that job is finished.

So 1 big question here is: Does it happen only with Avalonia, with any .NET UI app or maybe even any UI app regardless of it being .NET, Java or something else?

Stannieman avatar Sep 03 '22 09:09 Stannieman

That indeed makes sense. I wonder if a Xamarin Forms app behaves the same on macOS. If it does it might not be related to Avalonia.

chkr1011 avatar Sep 03 '22 09:09 chkr1011

I can't reproduce this on my MacBook Air M1 2020 running OSX 12.5.

I've tried both Avalonia 0.10.18 and 11 preview 1.

The only time that the "too long" message appears is a) once on startup and b) once each time I minimise or restore the main window. In these cases the task takes about 700ms, which is a long time but far short of the ten or twenty seconds seen in the bug report.

The repository linked to in the bug report doesn't exist, so I wrote my own test code:

public MainWindow()
{
    InitializeComponent();

    Console.WriteLine("Hello world");

    _ = StartTimer();
}

private async Task StartTimer()
{
    while (true)
    {
        var stopwatch = Stopwatch.StartNew();
        await Task.Delay(250);
        var elapsed = stopwatch.ElapsedMilliseconds;

        if (elapsed > 500)
        {
            Console.WriteLine($"Too long: {elapsed}");
        }
    }
}

TomEdwardsEnscape avatar Sep 03 '22 10:09 TomEdwardsEnscape

I'm sorry, repo was still private, fixed. But the code is basically the same as you have. BUT also just managed to reproduce with a Xamarin app, so it is also not related to Avalonia. Who do we have to blame now 😢 Next step: Crank out a C++ app and see what happens.

Stannieman avatar Sep 03 '22 10:09 Stannieman

Probably we have to blame the .net team which implements the task scheduler. 🤣

I will try the test repo ony iMac tomorrow. Maybe it is related to the OS version or the CPU.

chkr1011 avatar Sep 03 '22 10:09 chkr1011

I think the key to your issue is how async methods work in .Net.

For each await statement, after the task completes the runtime will try to resume your method on the same thread which entered the await.

In a UI application the main thread is (normally) set up with a SynchronizationContext which will resume methods at a fixed point in the UI loop, which means that your duration calculation will not execute until both the Task.Delay completes and the UI thread is ready to resume the method. The UI loop can be blocked by many things, including the host OS doing stuff to the native window.

In a console application there is by default no synchronisation context, so async methods resume on whichever thread completed each awaited task. This is probably why there is no problem in such an application.

You can skip the synchronisation feature like this:

await Task.Delay(250).ConfigureAwait(false);

If this fixes your problem, then you know that it's due to the UI frame loop.

TomEdwardsEnscape avatar Sep 03 '22 11:09 TomEdwardsEnscape

Inside the MQTTnet library where we originally discovered this it has ConfigureAwait(false). I will retest but I don't think it's the cause.

Stannieman avatar Sep 03 '22 11:09 Stannieman

@tomenscape

I can't reproduce this on my MacBook Air M1 2020 running OSX 12.5.

It's harder to reproduce on M1 macs because apps seem to have more budget there. So far it was only reported for Intel macs.

kekekeks avatar Sep 03 '22 11:09 kekekeks

I got something like a dispatcher running in a console app using the AsyncEx Nuget package, but no repro that way.

using Nito.AsyncEx;

internal class Program
{
    private static void Main(string[] args)
    {
        Console.WriteLine("Hello, World!");
        RunWithDispatcher();
    }

    private static void RunWithDispatcher()
    {
        AsyncContext.Run(async () =>
        {
            while (true)
            {
                var startTime = DateTime.Now;

                await Task.Delay(250);

                var duration = DateTime.Now - startTime;
                if (duration > TimeSpan.FromMilliseconds(500))
                {
                    Console.WriteLine($"Too long: {duration}");
                }
            }
        });
    }
}

Inside AsyncContext.Run it supposedly even has a SynchronizationContext that await will use to run the continuation. But it's not surprising that there is no repro here because this dispatcher has literally no other work to do.

Next I created a C++ app, but also no repro here. If you are interested: It shows a window with a single button using FLTK. When the button is clicked it starts a 250ms timer which, in the expiration handler, makes the same timer again wait for 250ms and so on. This is implemented using boost::asio::deadline_timer. It works similar to ConfigureAwait(false) in that when the timer expires the "reset of the timer" does not happen on UI thread but the timer thread. Catch here is that there is no thread pool involved, the timer has its own dedicated thread. But this is of course in a completely different ballpark than a managed UI app. The timer is also rock solid and is rarely more than 1ms off.

#include <FL/FL_Window.h>
#include <FL/FL_Button.h>
#include <FL/FL_Widget.h>
#include <FL/FL.h>
#include <boost/asio.hpp>
#include <boost/date_time/posix_time/posix_time_duration.hpp>
#include <chrono>
#include <iostream>

using namespace std;
using namespace std::chrono;
using namespace boost;
using namespace boost::asio;

namespace
{
	io_service ioService_{};
	deadline_timer timer_{ ioService_ };
}

void DoDelay()
{
	const auto start = steady_clock::now();

	timer_.expires_from_now(posix_time::milliseconds{ 250 });
	timer_.async_wait(
		[start](const auto &e)
		{
			if (!e.failed())
			{
				const auto duration = duration_cast<milliseconds>(steady_clock::now() - start);
				if (duration.count() > 500)
				{
					cout << "Too long: " << duration.count() << endl;
				}

				DoDelay();
			}
		});
}

void ButtonClicked(Fl_Widget *widget, void*)
{
	cout << "Started" << endl;

	DoDelay();

	thread
	{
		[&]() { ioService_.run(); }
	}.detach();
}

int main(int argc, char **argv)
{
	Fl_Window window{ 300, 180 };
	window.begin();

	Fl_Button button{ 20, 40, 260, 100, "Start" };
	button.callback(ButtonClicked);
	
	window.end();
	window.show(argc, argv);

	return Fl::run();
}

So it seems to be specific to .NET UI apps or .NET apps with an event loop that has actual UI work to do. I will try to repro it with .NET MAUI tomorrow. If we see the same issue with MAUI then it has a good chanse of getting some attention from Microsoft. They probably want their new shiny thing to work correctly afterall.

Stannieman avatar Sep 03 '22 18:09 Stannieman

Just had this in C++, but the CPU was under load so that is acceptable.

Too long: 636
Too long: 1165
Too long: 869

EDIT: Bingo, have a repro with MAUI. I'll create an issue for .NET.

The saga continues: https://github.com/dotnet/runtime/issues/75065

Stannieman avatar Sep 03 '22 20:09 Stannieman

I wonder if disabling app-nap would help with this issue. https://lapcatsoftware.com/articles/prevent-app-nap.html https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheAppLevel.html

Note it would require macos api interop or monomac/net6-macos usage.

maxkatz6 avatar Sep 03 '22 23:09 maxkatz6

@maxkatz6 seems you are right. After running defaults write NSGlobalDomain NSAppSleepDisabled -bool YES to disable App Nap globally the issue is completely gone. Thanks! So yep, we need to look at some PInvoke magic.

Stannieman avatar Sep 04 '22 11:09 Stannieman

I will try to add an option to disable this in MacOSPlatformOptions.

Stannieman avatar Sep 04 '22 12:09 Stannieman

Created a PR with new option, ~~but the app crashes when loading. Any ideas? My C++ is decent but I do not see what is going wrong here.~~ Today it magically works 🤷 I got 1 open question https://github.com/AvaloniaUI/Avalonia/pull/8894#discussion_r962748576 when this is resolved it is ready for review.

Stannieman avatar Sep 05 '22 09:09 Stannieman