razor icon indicating copy to clipboard operation
razor copied to clipboard

Razor LSP server appears to be able to run requests on wrong document version

Open gundermanc opened this issue 2 years ago • 4 comments

Describe the bug:

While investigating Semantic tokens can occasionally ask for wrong range of document after ctrl+Z, filed against VS LSP client, I observed what appears to be a potential race condition in the Razor LSP server that may be responsible for 1508587.

@NTaylorMullen, @ryanbrandenburg could this be the root cause of the infamous Disco Colors issue? I fixed all known VS LSP client repros of this bug in 17.1, but this customer is reporting that it still repros. The behavior I'm observing suggests that LSP requests in Razor could occasionally run against the wrong document version.

Version used:

e.g. VS2022 17.3 Preview 1

To reproduce:

  1. Create a new Blazor server app
  2. Open FetchData.razor
  3. Replace the content with:
@page "/fetchdata"

<PageTitle>Weather forecast</PageTitle>

@using BlazorApp70.Data
@inject WeatherForecastService ForecastService

<strong>
    @DateTime.Now.ToString()
</strong>


<div>
    <strong>
        @DateTime.Now.ToString()
    </strong>
</div>


@{
    var abc = true;

    void Bar()
    {
        
    }
}


<content>
    <div>
    </div>
</content>
<strong>
    @DateTime.Now.ToString()
</strong>

<h1>Weather forecast</h1>

<p>This component demonstrates fetching data from a service.</p>

@if (forecasts == null)
{
    <p><em>Loading...</em></p>
}
else
{
    <table class="table">
        <thead>
            <tr>
                <th>Date</th>
                <th>Temp. (C)</th>
                <th>Temp. (F)</th>
                <th>Summary</th>
            </tr>
        </thead>
        <tbody>
            @foreach (var forecast2 in forecasts)
            {
                <tr>
                    <td>@forecast2.Date.ToShortDateString()</td>
                    <td>@forecast2.TemperatureC</td>
                    <td>@forecast2.TemperatureF</td>
                    <td>@forecast2.Summary</td>
                </tr>
            }
        </tbody>
    </table>
}


@code {
    private WeatherForecast[]? forecasts;

    protected override async Task OnInitializedAsync()
    {
        forecasts = await ForecastService.GetForecastAsync(DateTime.Now);
    }
}

It should be 80 lines long.

  1. On line 30 (put your cursor after the<c` and hit enter.
  2. Hit ctrl + z (undo).
  3. Repeat several times. If done quickly enough, an ArgumentOutOfRangeException is thrown from the Razor semantic tokens handler.

Explanation

This was originally filed as a VS LSP client bug. Inspecting both components, I don't think this sort of bug should be possible in the LSP client.

  • We have one thread that dispatches LSP messages per client per document.
  • This thread is responsible for updating the text version via didChange
  • Semantic Tokens goes through that same thread
  • This thread keeps a record of the most recently sent ITextSnapshot.
  • Any requests sent have their Positions and Ranges translated to that snapshot.

This process automatically realigns spans with the bounds of the translated snapshot, and there's only one thread, so there shouldn't be many opportunities to race.

Background

  • LSP is a 'well-ordered' protocol - each message is assumed to apply to the state of the world after the in-order application of previous messages.
  • An implication of this is that a correctly behaved server must guarantee that any textDocument/didChange notifications are fully applied before executing any subsequent requests.
  • In other words, in didChange A => B, didChange B => C, didChange C => D, semanticTokens, the semantic tokens request should run on version D.
  • Inspecting Razor didChange handler, however, it appears that it could potentially run on version C, as seems to be happening in this scenario.

Existing Razor Support

Existing Razor support attempts to enforce the above requirements:

  • There is a single 'dispatcher' thread which is used to execute didChange notifications and document version info lookups.
  • The dispatcher thread is the right idea and is mostly effective, however, it is executing handlers as a series of disjoint continuations. It seems like these could potentially interleave, violating the second bullet from the 'Background' section.

Potential Root Cause

Razor code is yielding in the middle of application of a didChange notification.

image

It looks like this introduces the possibility that a semantic tokens request that arrives after a didChange could become interleaved with that didChange, causing it to get handled before the text version it is supposed to run on is recorded in Razor's project service.

image

Supporting Evidence

I added static variables to the didChange message handler in the first dispatcher continuation (captures the most recently received text version) and in the last dispatcher continuation (captures the most recently 'committed' text version that is visible for use by the semantic tokens handler.

image

I then added code in the Semantic Tokens handler that launches the debugger any time received and committed version numbers are different in the middle of the part of the semantic tokens handler that runs on the dispatcher thread, and it was hit immediately.

image

While this isn't definitive proof of a bug (perhaps the received version is from a didChange immediately following the erroring semantic tokens call), it is suspicious, as it demonstrates that there are cases where a semantic tokens request's version can be established on the dispatcher thread interleaved between continuations of the didChange handler.

gundermanc avatar Apr 04 '22 16:04 gundermanc

@NTaylorMullen, @ryanbrandenburg could this be the root cause of the infamous Disco Colors issue?

It seems totally plausible to me, but @NTaylorMullen is probably our expert on our Threading model and ensuring message order.

ryanbrandenburg avatar Apr 04 '22 17:04 ryanbrandenburg

@gundermanc I read through the bits but the one piece missing is that we have a custom scheduler that ensures that all serial (textDocument/didChange etc.) type requests always finish before running any other types of requests. I tried to dig a bit more to validate that nothing was going wrong there and here's what I did to verify:

In our textDocument/didChange handler I captured some state:

  • If handler was currently running
  • The last seen document version pre-async operations and post-async operations

In our semantic token handler I captured all this data locally from the didChange endpoint and looked to see if it changed anywhere during the lifetime of the semantic tokens endpoint. What I found was, when the semantic tokens endpoint ran the didChange handler did not run for the entirety yet things still exploded: image

Ultimately it was caused because we were being asked for semantic tokens for 0-80 lines inclusive whereas our document was only 80 lines count resulting in the off-by-one due to 0-indexing.

Totally possible I'm still doing something wrong but lets chat more tomorrow. If you're curious here's my branch

NTaylorMullen avatar Apr 04 '22 23:04 NTaylorMullen

always finish before running any other types of requests

Do you mean this function? It looks like it properly waits when executing a serial, then parallel request, but AFAICT there's still risk of interleaved execution when executing a parallel, then a serial, because the parallel requests are not awaited before starting subsequent serial requests and may enqueue continuations that happen in inconsistent order relative to the serial request's continuations.

My experience is that for the relative message order to be preserved, the task that records the document version would need to be synchronously enqueued by the handler before it yields the first time. That is at least how it works in StreamJsonRpc powered client and server components.

I don't understand how any scheduler can enforce that any disjoint continuations are complete prior to running another request, without awaiting that chain of tasks. Referring again to the didChange handler, it looks like it enqueues two separate continuations on the dispatcher thread...

image

... and a critical section of semantic tokens handler can be observed to run interleaved between them (they can have different values in the semantic tokens dispatcher thread task). image

gundermanc avatar Apr 05 '22 04:04 gundermanc

Ya it's funny, in the shower this morning I had the same thought. Going to dig this morning

NTaylorMullen avatar Apr 05 '22 16:04 NTaylorMullen

The Roslyn work for this is accomplished here https://github.com/dotnet/roslyn/pull/62924.

ryanbrandenburg avatar Sep 06 '22 23:09 ryanbrandenburg

FYI: Thanks to a repro from the great folks on the debugger team, I did end up finding a race condition on the LSP client side where in extremely obscure cases we can send request messages out of sequence.

I think the aforementioned Razor code is still suspect, so the work done to improve it is a worthwhile investment, but there are fixes coming to VS in 17.5 that should improve resilience of various features: https://devdiv.visualstudio.com/DevDiv/_git/VSLanguageServerClient/pullrequest/426457

gundermanc avatar Sep 29 '22 20:09 gundermanc

This is hypothetically resolved now that CLaSP is merged. I will close and we can re-open if the issue re-cures.

ryanbrandenburg avatar Oct 04 '22 23:10 ryanbrandenburg