bUnit Investigate failing tests related to "wait for" logic on Windows

Recently the verification workflow on Windows has started to fail. It is not crashing as far as I can tell, it's just unit tests related to "wait for" logic (concurrency) that are failing due to the condition not being met.

It does not look like its always the same test, and I have been able to reproduce it by running tests in a loop locally (through the run-tests.ps1 script, in release mode).

Lets use this issue to log our investigation/progress, i.e. what has been tried and findings.

First thing that needs to happen is probably to go to the last released version and see if the problem exists at that point as well, with the latest .NET 6//NET 7 runtimes. If not, then perhaps use git bisect in interactive mode and try to see at what point the failure starts showing up.

Sep 15 '22 16:09 egil

I would like to have this sorted before we push a new release. Of course, if the problem also exists in the currently released version, then there is no reason not to release the two new features.

Sep 15 '22 16:09 egil

I would like to have this sorted before we push a new release. Of course, if the problem also exists in the currently released version, then there is no reason not to release the two new features.

If I remember right, then we had this issue before the release. I saw it here and there but it seems to be more prominent at the moment. i will have a look once I am back from vacation.

what makes it hard to debug is that it seems to impact the windows runner the most. Here and there also macOS runner. Anyway we need to address this :)

Sep 15 '22 16:09 linkdotnet

I would like to have this sorted before we push a new release. Of course, if the problem also exists in the currently released version, then there is no reason not to release the two new features.

If I remember right, then we had this issue before the release. I saw it here and there but it seems to be more prominent at the moment. i will have a look once I am back from vacation.

I'm pretty sure there was an issue before the previous release, or the one before that, what was solved after some of the code related to wait for and updates of the DOM was changed.

It has definitely started breaking much more the last 14 days, and I think if we look back on main runs on verification we can see it going well for a while.

what makes it hard to debug is that it seems to impact the windows runner the most. Here and there also macOS runner. Anyway we need to address this :)

Enjoy your vacation, I'll take stab at the initial investigation and see if we can find a commit that started breaking things.

Sep 15 '22 16:09 egil

Arrrg, now I cannot reproduce locally with .net 7 rc.1 and on bf8fc00cedcb7c6eb9af2724cdb2fa4ff3ac2eb5.

I have attempted with this set of tests, running the a loop to try to force the issue:

// This code/file was originally copied from https://github.com/dotnet/aspnetcore/
// It's content has been modified from the original.
// See the NOTICE.md at the root of this repository for a copy
// of the license from the aspnetcore repository.
using Bunit.TestAssets.BlazorE2E;
using Xunit.Abstractions;

namespace Bunit.BlazorE2E;

/// <summary>
/// This tests are based on the tests from the following AspNetCore tests class.
/// The aim is to only modify the original tests to not use Selenium, and instead use the
/// <see cref="TestContext" />.
/// https://github.com/dotnet/aspnetcore/blob/master/src/Components/test/E2ETest/Tests/ComponentRenderingTest.cs.
/// </summary>
public class WaitForTest : TestContext
{
    public WaitForTest(ITestOutputHelper outputHelper)
    {
        Services.AddXunitLogger(outputHelper);
        JSInterop.Mode = JSRuntimeMode.Loose;
    }

    public static TheoryData<int> RepeatRuns
    {
        get
        {
            var result = new TheoryData<int>();
            foreach (int i in Enumerable.Range(1, 100))
                result.Add(i);
            return result;
        }
    }

    [Theory, MemberData(nameof(RepeatRuns))]
    public void CanTriggerAsyncEventHandlers(int run)
    {
        // Initial state is stopped
        var cut = RenderComponent<AsyncEventHandlerComponent>();
        var stateElement = cut.Find("#state");
        Assert.Equal("Stopped", stateElement.TextContent);

        // Clicking 'tick' changes the state, and starts a task
        cut.Find("#tick").Click();
        Assert.Equal("Started", stateElement.TextContent);

        cut.Find("#tock").Click();
        cut.WaitForAssertion(() => Assert.Equal("Stopped", stateElement.TextContent));
    }

    [Theory, MemberData(nameof(RepeatRuns))]
    public void CanAcceptSimultaneousRenderRequests(int run)
    {
        var expectedOutput = string.Join(string.Empty, Enumerable.Range(0, 100).Select(_ => "😊"));

        var cut = RenderComponent<ConcurrentRenderParent>();

        // It's supposed to pause the rendering for this long. The WaitAssert below
        // allows it to take up extra time if needed.
        // await Task.Delay(1000);

        var outputElement = cut.Find("#concurrent-render-output");

        cut.WaitForAssertion(
            () => Assert.Equal(expectedOutput, outputElement.TextContent.Trim()),
            timeout: TimeSpan.FromMilliseconds(2000)
        );
    }

    [Theory, MemberData(nameof(RepeatRuns))]
    public void CanDispatchRenderToSyncContext(int run)
    {
        var cut = RenderComponent<DispatchingComponent>();
        var result = cut.Find("#result");

        cut.Find("#run-with-dispatch").Click();

        cut.WaitForAssertion(
            () => Assert.Equal("Success (completed synchronously)", result.TextContent.Trim())
        );
    }

    [Theory, MemberData(nameof(RepeatRuns))]
    public void CanDoubleDispatchRenderToSyncContext(int run)
    {
        var cut = RenderComponent<DispatchingComponent>();
        var result = cut.Find("#result");

        cut.Find("#run-with-double-dispatch").Click();

        cut.WaitForAssertion(
            () => Assert.Equal("Success (completed synchronously)", result.TextContent.Trim())
        );
    }

    [Theory, MemberData(nameof(RepeatRuns))]
    public void CanHandleRemovedParentObjects(int run)
    {
        var cut = RenderComponent<DispatcherException>();

        cut.Find("button").Click();

        cut.WaitForState(() => !cut.FindAll("div").Any());
        cut.FindAll("div").Count.ShouldBe(0);
    }

    [Theory, MemberData(nameof(RepeatRuns))]
    public async Task CanHandleRemovedParentObjectsAsync(int run)
    {
        var cut = RenderComponent<DispatcherException>();

        await cut.Find("button").ClickAsync(new MouseEventArgs());

        cut.WaitForState(() => !cut.FindAll("div").Any());
        cut.FindAll("div").Count.ShouldBe(0);
    }
}

Then, running it with this command:

dotnet test -c release --blame --nologo --filter "Bunit.BlazorE2E.WaitForTest" --logger:"console;verbosity=normal"

This is annoying to say the least.

Sep 16 '22 15:09 egil

For sure I had failing tests on my NavigationLock PR on the windows runner. But yes locally I also get green tests. To rule out that .net 7 rc1 fixed something I will remove it locally and run against preview7 and an older version of bUnit.

EDIT: I guess I will run this on my windows machine later. On the macbook everything runs as expected (even with preview 7).

Sep 17 '22 07:09 linkdotnet

For sure I had failing tests on my NavigationLock PR on the windows runner.

Never paid much attention to the failing tests on that PR since I assumed it failed because some types was missing due to rc1 not being out.

But yes locally I also get green tests. To rule out that .net 7 rc1 fixed something I will remove it locally and run against preview7 and an older version of bUnit.

EDIT: I guess I will run this on my windows machine later. On the macbook everything runs as expected (even with preview 7).

Are you using the repeating tests i posted above?

Sep 17 '22 09:09 egil

Are you using the repeating tests i posted above?

Yes did use your attached code snippet and let it run against .net6 and .net7.

Sep 17 '22 11:09 linkdotnet

Tried it now with my windows machine with a few different commits.. always green (let the test run twice per commit, to make "sure").

Sep 18 '22 14:09 linkdotnet

I am still in favour of the theory that we are running out of threads in the thread pool. I forced our unit test runner to run every test sequentially (default of xUnit is only sequential inside on test-class but parallel over test-class boundaries) and we have consistent green runs on GitHub. I am aware that it doesn't rule out that we might have a) undisposed stuff, which interferes or b) a race condition.

Anyway I do think making our API more async would help us a lot with some of those problems (especially the starvation issue).

@egil I am not sure whether or not to open the PR so that our test suite runs sequentially. On one hand it does not increase the time massively and will give us green builds, but on the other hand that is something, which can happen to our users.

Oct 17 '22 09:10 linkdotnet

Great work!

A few points:

I think we should be running our tests as our users will be running theirs.
Its great if we have a theory or idea of what is causing issues.
Let's continue to investigate what is causing the thread starvation.

While you are going down that rabbit hole, I think it's worthwhile to look into shared synchronization context (Dispatcher in renderer) and see how that affects concurrency. That would also solve problems like the MarkupMatches issues.

Oct 17 '22 10:10 egil

Great work!

A few points:

I think we should be running our tests as our users will be running theirs.

Its great if we have a theory or idea of what is causing issues.

Let's continue to investigate what is causing the thread starvation.

While you are going down that rabbit hole, I think it's worthwhile to look into shared synchronization context (Dispatcher in renderer) and see how that affects concurrency. That would also solve problems like the MarkupMatches issues.

I will have a look into the Synchronization-Context. I am not sure if this would make our tests more stable in that regard. If we really suffer from thread starvation, I don't think this will do the trick. But it will tackle other problems... and then again, it's just my working hypothesis.

Oct 17 '22 11:10 linkdotnet

All the tests we see this issue with is ones that use WaitForXXX, right?

If that's the case, then it may be that which is causing the starvation on some machines. What you could do as an experiment is to create async versions of the methods in RenderedFragmentWaitForHelperExtensions, because the WaitForHelper type is actually properly async, and then update all the tests that uses the blocking sync versions to the new async versions.

We can keep both versions around, but if it turns out it solves the problem, then we can include the async versions without causing a breaking change. Users can change to the async versions if they have problems.

The hardest part is going to figure out the naming 😁

await cut.WaitForStateAsync(...);
await cut.WaitForAssertionAsync(...);
var elm = await cut.WaitForElementAsync(...);
var elms = await cut.WaitForElementsAsync(...);

This is a bit verbose but aligns with existing API.

await cut.StateAsync(...);
await cut.AssertionAsync(...);
var elm = await cut.ElementAsync(...);
var elms = await cut.ElementsAsync(...);

This reads a little better, but the names might be less obviously?

Oct 17 '22 11:10 egil

I am still in favour of WaitFor.... For two reasons:

Async API's normally just get the suffix Async (if a sync version is available).
.NET also does not omit Wait for example in Task.WaitAll

Oct 17 '22 11:10 linkdotnet

To be honest I'd experiment with the async version of WaitForXXX rather than the SynchroniztationContext for two reasons:

I have to read more about SynchronizationContext to understand better what is going on.
I still thing if a user does that: await SomeAsyncCall().ConfigureAwait(false); we will run into the same situation as before as the SynchronizationContext is gone. We could argue you shouldn't do this anyway in ASP.NET Core, but that doesn't count for third party libraries. So it would make our code better, but it could happen that users still have issues, we can't reproduce... but then: Maybe I didn't understand the mechanism completely.

Oct 17 '22 14:10 linkdotnet

Yeah, it's completely fine you focus on the async versions of wait for xxx.

Oct 17 '22 14:10 egil

I shamelessly stole the SingleThreadSynchronizationContext from Visual Studio and suprise suprise, still we have red tests in WaitForXXX and that makes sense.

If we are not using await WaitFor then there is no SynchronizationContext involved at all:

I planned my day tomorrow to make the WaitForXXX POC work.

Oct 17 '22 15:10 linkdotnet

That makes sense indeed since we're currently blocking a thread when waiting.

Oct 21 '22 09:10 egil

@egil @linkdotnet We are still facing bunit tests failing in build agents. We have tried to: -disable run in parallel in xunit -waitforstateasync -waitforstate with long timeout -windows, linux build agents

Nothing helps. Is there any method to fix this in bunit 1.X? We are using .net 6 and bunit 1.14.4.

Feb 03 '23 09:02 pmpbpl

@pmpbpl we have been experimenting with various approaches to stabilize this under certain conditions. However, my conclusion thus far is that it's a pretty tough problem since there are essentially two threads in play that can complete at random times and may not always be predictable, especially on slower build-agents. We've not given up yet, but all the experiments so far have been inconclusive.

That said, can you share the tests with us that are causing problems for you, including the components they are testing? It could be that we can spot something that can help you. Please open a new discussion (https://github.com/bUnit-dev/bUnit/discussions/categories/q-a) with your test sample and component sample if you can share those.

Feb 03 '23 10:02 egil

I'll close this for now.

Some breaking changes may be required in the future to provide users with deterministic test runs. In particular, I want to investigate blocking updates to the render tree unless a test indicates that it is ready to observe those changes, e.g. via WaitForAssertion or WaitForElement, or a user triggers an event handler, after which it is expected that there is a rerender.

The latest investigations are reflected in https://github.com/bUnit-dev/bUnit/pull/1021 which will hopefully fix some edge cases we are seeing.

Mar 07 '23 11:03 egil

bUnit bUnit copied to clipboard

Investigate failing tests related to "wait for" logic on Windows

bUnit
bUnit copied to clipboard