aspnetcore Issue when YARP is combined into a "normal" app

Is there an existing issue for this?

[X] I have searched the existing issues

Describe the bug

After moving to .NET7, I found a problem, that occures, when YARP is part of the application. Since it seems to be somewhere between YARP, ASP.Net core and .NET7, I like to bring it to your attention here.

It makes using NET7 for use-cases with BFF driven by YARP "difficult" and ends up in lots of these:

AmbiguousMatchException: The request matched multiple endpoints. Matches:

/Index
/Index
/Index

Heres the Issue with repro https://github.com/microsoft/reverse-proxy/issues/1864

Expected Behavior

I'd expext YARP not to break by upgrading from .NET6 to .NET7

Steps To Reproduce

https://github.com/glatzert/YARPBug

Exceptions (if any)

No response

.NET Version

7.0.100-rc.1.22431.12

Anything else?

No response

Sep 20 '22 12:09 glatzert

There's no conflicting routes in the app. Yarp is on "api/{**catch-all}" and the razor pages are at the root.

@halter73 can you check what's going on with the host's implicit route builder and why adding yarp might trigger double razor registrations?

Sep 20 '22 16:09 Tratcher

I'll add, it's not only problematic with Razor pages, but controllers will also show the AmbigousMatchException behavior

Sep 20 '22 17:09 glatzert

@halter73 it seems like the CompositeEndpointDataSource changes are what's making it add duplicates. I'll make a minimal repro to show the issue. I understand it now.

Sep 20 '22 19:09 davidfowl

using Microsoft.Extensions.Primitives;

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();

((IEndpointRouteBuilder)app).DataSources.Add(new CustomDS());

app.MapGet("/", () => "Hello World!");


app.Run();

class CustomDS : EndpointDataSource
{
    private List<Endpoint>? _endpoints;
    private CancellationTokenSource _cts = new();

    public override IReadOnlyList<Endpoint> Endpoints
    {
        get
        {
            if (_endpoints is null)
            {
                _endpoints ??= new();

                var old = _cts;
                _cts = new();
                old.Cancel();
            }

            return _endpoints;
        }
    }

    public override IChangeToken GetChangeToken()
    {
        return new CancellationChangeToken(_cts.Token);
    }
}

Firing the change token after producing endpoints is causing the composite data source to re-evaluate the list while it's populating it (a side effect of accessing endpoints).

Sep 20 '22 19:09 davidfowl

This was partially mitigated by #43729. _endpoints is still null the first time GetChangeToken() fires, so we avoid stack diving the first time endpoints are resolved in this case.

However, if you update the sample to always trigger the change token in Endpoints and to resolve the root enpoints more than once, it can still stack overflow. I'm not sure how common this is, but it would be easy to avoid reloading endpoints if we're already in the middle of doing so on the same thread up stack.

using Microsoft.Extensions.Primitives;

var app = WebApplication.Create(args);

var customDS = new CustomDS();
((IEndpointRouteBuilder)app).DataSources.Add(customDS);
// Force evaluation of the composite data source endpoints. The same could be done with an HTTP request, but this is easy.
_ = app.Services.GetRequiredService<EndpointDataSource>().Endpoints;
// Now that _enpoints is initialized. Trigger the change token to induce a stack overflow.
customDS.Refresh();

// We don't get this far.
app.Run();

class CustomDS : EndpointDataSource
{
    private List<Endpoint>? _endpoints;
    private CancellationTokenSource _cts = new();

    public void Refresh()
    {
        var old = _cts;
        _cts = new();
        old.Cancel();
    }

    public override IReadOnlyList<Endpoint> Endpoints
    {
        get
        {
            _endpoints ??= new();

            var old = _cts;
            _cts = new();
            old.Cancel();

            return _endpoints;
        }
    }

    public override IChangeToken GetChangeToken()
    {
        return new CancellationChangeToken(_cts.Token);
    }
}

Previously, the CompositeEndpointDataSource would unregister from change tokens while evaluating endpoints, but if someone were to add another endpoint on another thread while this was happening, we would miss the update.

Sep 20 '22 20:09 halter73

I'm currently doing hackathon things, but I think this is our best bet to fully fix it without introducing race conditions where we might miss changes originating from another thread.

It uses a ~[ThreadStatic]~ ~Environment.CurrentManagedThreadId~ ThreadLocal<bool> to track if a child EndpointDataSource is triggering a change token inline when the CompositeDataSource is reading from the child Endpoints or resolving a new change token without invoking user change registration with a lock or risking missing changes originating in parallel on another thread. If we agree with this approach, I can add a test and open a PR.

Given that this is partially mitigated, do we still want to try to fix this for .NET 7? Or do we just want to tell developers to stop raising change events when endpoints are merely being accessed not changed?

Sep 20 '22 22:09 halter73

Who is raising the change event in this case? YARP? MVC?

Sep 20 '22 23:09 Tratcher

https://github.com/microsoft/reverse-proxy/blob/5a86eb4f18474bbe2f9e73f8457f4cbf13c00b81/src/ReverseProxy/Management/ProxyConfigManager.cs#L141

Sep 21 '22 00:09 davidfowl

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

Sep 22 '22 19:09 ghost

@glatzert This may be fixed in rc2. Can you install a .NET rc2 nightly build and let us know if the issue is fixed on your end ? https://github.com/dotnet/installer

Sep 22 '22 19:09 rafikiassumani-msft

To be more specific, the middle column "Release/7.0.1xx" in this table with the "-rtm.*" suffixes should have the original issue fixed but not this one where the child EndpointDataSource always triggers its change token in its Endpoints getter.

That still stack overflows, but we think child data sources should not do this. If we wanted to try ignoring change events originating down stack on the same thread, we could use a ThreadLocal to do so as I demonstrate in https://github.com/dotnet/aspnetcore/compare/halter73/44085?expand=1&w=1.

Sep 22 '22 19:09 halter73

I installed 7.0.0-rtm.22466.6 and was not able to reproduce the bug anymore :)

Sep 23 '22 06:09 glatzert

I discovered that you can still reproduce this without cancelling the change token in EndpointDataSource.Endpoints like YARP does. Merely "uncancelling" the token too late can lead to a stack overflow which I think makes the severity of the issue much worse in my opinion.

public class DynamicEndpointDataSource : EndpointDataSource, IDisposable
{
    private CancellationTokenSource _cts = new();
    private Endpoint[] _endpoints = Array.Empty<Endpoint>();
    private PeriodicTimer _timer;
    private Task _timerTask;

    public DynamicEndpointDataSource()
    {
        _timer = new PeriodicTimer(TimeSpan.FromSeconds(1));
        _timerTask = TimerLoop();
    }

    public override IReadOnlyList<Endpoint> Endpoints => _endpoints;

    public async Task TimerLoop()
    {
        while (await _timer.WaitForNextTickAsync())
        {
            var oldLength = _endpoints.Length;

            var newEndpoints = new Endpoint[oldLength + 1];
            Array.Copy(_endpoints, 0, newEndpoints, 0, oldLength);

            newEndpoints[oldLength] = new RouteEndpoint(context => context.Response.WriteAsync($"Dynamic endpoint #{oldLength}"),
                RoutePatternFactory.Parse($"/dynamic/{oldLength}"), 0, null, null);

            _endpoints = newEndpoints;

            _cts.Cancel();
            _cts = new CancellationTokenSource();
        }
    }

    public override IChangeToken GetChangeToken()
    {
        return new CancellationChangeToken(_cts.Token);
    }

    public void Dispose()
    {
        _timer.Dispose();
        _timerTask.GetAwaiter().GetResult();
    }
}

Changing:

_cts.Cancel();
_cts = new CancellationTokenSource();

to:

var lastCts = _cts;
_cts = new CancellationTokenSource();
lastCts.Cancel();

does work around this issue, and it is generally better to cancel to old token in this way to event any potential stack overflows from consuming code. However, this did not stack overflow in .NET 6, and I think this is very easy to get wrong. I accidently wrote an EndpointDataSource the bad way initially while preparing for my Ignite talk and was surprised to see the stack overflow.

Fortunately, unlike the issue caused by cancelling the change token repeatedly in the EndpointDataSource.Endpoints, we don't need to resort to something like a ThreadLocal<bool> to fix this. CompositeEndpointDataSource can just dispatch its change handler to the ThreadPool to avoid the stack overflow in this case.

~It's also worth noting this is a regression from .NET 6.~ No it's not. See my next comment.

Oct 06 '22 00:10 halter73

This isn't actually a regression. I looked at grep.app and was surprised how many dynamic EndpointDataSource implementations were doing the right thing despite it being unnecessary before. Well, it was because it was unnecessary before. I think it still might be worthwhile to dispatch, but I don't think it's worth servicing. 😆

Oct 06 '22 01:10 halter73

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s). If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

Oct 06 '22 01:10 ghost