fsharp icon indicating copy to clipboard operation
fsharp copied to clipboard

AsyncLocal part 2 - replace all ThreadStatics and remove NodeCode and node CE

Open majocha opened this issue 1 year ago • 35 comments

This supersedes or follows up on #16602

Introduction of Transparent Compiler revealed some instability w.r.t. DiagnosticsThreadStatics: https://github.com/dotnet/fsharp/pull/16576#issuecomment-1910321553 #16576 #16589

Turns out we used [<ThreadStatic>] to keep a compilation-global DiagnosticsLogger, and a custom NodeCode CE to flow it across threads during execution. AsyncLocal is a BCL class that provides exactly that functionality: flowing state along the async execution path.

This replaces the ThreadStatic diagnostics state and Cancellable state with AsyncLocal.

NodeCode<'T> is no longer needed, so we remove it.

However NodeCode had some methods with subtly different behavior than Async. Those are reproduced as Async type extensions in BuildGraph.fs.

  • NodeCode.Sequential ->Async.SequentialImmediate. (Because normal Async.Sequential queues tasks in the thread pool).
  • NodeCode.RunImmediateWithoutCancellation ->Async.RunImmediateWithoutCancellation.

Some NodeCode methods effectively created a compilation scope, restoring previous global DiagnosticsLogger on finish:

  • NodeCode.FromCancellable
  • NodeCode.AwaitAsync
  • NodeCode.AwaitTask

This doesn't seem to be covered by the test suite at the moment. (All the tests pass with or without) but the following extensions wrap the computation in use _ = new CompilationGlobalsScope(... preserving and restoring globals:

  • Async.FromCancellableWithScope
  • Async.CompilationScope

majocha avatar Feb 02 '24 00:02 majocha

:heavy_exclamation_mark: Release notes required


:white_check_mark: Found changes and release notes in following paths:

Change path Release notes path Description
src/Compiler docs/release-notes/.FSharp.Compiler.Service/8.0.300.md
vsintegration/src docs/release-notes/.VisualStudio/17.10.md

github-actions[bot] avatar Feb 02 '24 00:02 github-actions[bot]

Yes this is still stuck on the same problem in CI. I looked for performance regressions, even tried to rewrite the locking in BuildGraph (cause maybe just maybe it exhausts allowed handles when executing those 10000 simultanous GetOrComputeValue).

It looks like it freezes on a certain test in Linux / MacOS and something wierd also happens on Windows.

ETA: The problem was Async.Sequential queues work in threadpool and this somehow causes a deadlock (probably around tcImportsLock) in some of fsharpqa Import tests involving fsi and #r requiring a dll.

majocha avatar Feb 02 '24 09:02 majocha

Where can I find the test that outputs stuff like

Import (ReferenceArchAndPlatformArchMismatch: anycpu/anycpu) -- passed
Import (ReferenceArchAndPlatformArchMismatch: anycpu/x86) -- passed
Import (ReferenceArchAndPlatformArchMismatch: anycpu/x64) -- passed
Import (ReferenceArchAndPlatformArchMismatch: anycpu/Itanium) -- passed
Import (ReferenceArchAndPlatformArchMismatch: x86/anycpu) -- passed
Import (ReferenceArchAndPlatformArchMismatch: x86/x86) -- passed
Import (ReferenceArchAndPlatformArchMismatch: x86/x64) -- passed
Import (ReferenceArchAndPlatformArchMismatch: x86/Itanium) -- passed
Import (ReferenceArchAndPlatformArchMismatch: x64/anycpu) -- passed
Import (ReferenceArchAndPlatformArchMismatch: x64/x86) -- passed
Import (ReferenceArchAndPlatformArchMismatch: x64/x64) -- passed
Import (ReferenceArchAndPlatformArchMismatch: x64/Itanium) -- passed
Import (ReferenceArchAndPlatformArchMismatch: Itanium/anycpu) -- passed
Import (ReferenceArchAndPlatformArchMismatch: Itanium/x86) -- passed
Import (ReferenceArchAndPlatformArchMismatch: Itanium/x64) -- passed
Import (ReferenceArchAndPlatformArchMismatch: Itanium/Itanium) -- passed

?

OK I got it, it's in FsharpQA env.lst. Hmmm.

majocha avatar Feb 02 '24 11:02 majocha

@majocha if you'd like to, we can merge the first one, and then postpone this one a bit, I can go and try continue replacing with cold task-based one.

vzarytovskii avatar Feb 02 '24 14:02 vzarytovskii

@vzarytovskii I'm fine with it. I think as is, it still improves things with Transparent Compiler.

majocha avatar Feb 02 '24 15:02 majocha

OK, so this is green.

It appears all of the problems with this stemmed from my oversight.

NodeCode had subtly different behavior for functions like Sequential. SImply replacing it with Async.Sequential introduced some weird test failures that made me scratch my head for a week.

@vzarytovskii I think we can after all take this PR in favor of the #16602

I'll iron things out later. The diff is not small but reasonably easy to review.

majocha avatar Feb 04 '24 11:02 majocha

That looks great! I want to have more extensive testing for this though, since it affects all compilation paths now, and technically it can affect a bunch of compilation flows. I trust that we can cover majority of those by just building our proto compiler and running tests, but would love to hear more opinions on that.

Any thoughts on that @0101, @majocha ?

vzarytovskii avatar Feb 04 '24 13:02 vzarytovskii

Yes, I've seen first hand how the most subtle changes here were blowing up or freezing fsharpqa tests. It was eventually all caught by the tests, even if sometimes with little or no indication what's wrong.

So at the moment I have a lot of trust in the test suite. With the exception of the newest stuff:

For example the sporadic NRE in VS when using Transparent Compiler: https://github.com/dotnet/fsharp/pull/16576#issuecomment-1910321553

I'm pretty convinced this PR fixes it, but we really should have a repro and a test for this.

majocha avatar Feb 04 '24 13:02 majocha

I'm a big fan of getting this in. I'll try to think of some more tests. I'd like to come up with some fuzzing test that would reveal the NRE we were seeing.

Also, should we try to run some benchmarks on this change to see if performance has changed? There's now less wrapping but also AsyncLocal could have some costs.

0101 avatar Feb 05 '24 09:02 0101

Yes for seriously benchmarking this. I've run only FCSSourceFiles from time to time and the final version seems to run the same as main.

majocha avatar Feb 05 '24 10:02 majocha

I wonder if uncommenting these asserts would reveal some potential problem points, anything not caught by the test suite. This todo is there probably since 2017: https://github.com/dotnet/fsharp/blob/589d350ac2831cb6d52049eeb4118a1af35bf9e1/src/Compiler/Facilities/DiagnosticsLogger.fs#L345-L350

I guess the original idea here was that AssertFalseDiagnosticsLogger would signal we are pushing diagnostics but have not initialized any DiagnosticsLogger.

assert works only in debug builds?

majocha avatar Feb 05 '24 10:02 majocha

Yeah that might be a good idea. We can also just put failwith there.

0101 avatar Feb 05 '24 11:02 0101

Yeah that might be a good idea. We can also just put failwith there.

Good idea. I already see some tests failing.

majocha avatar Feb 05 '24 12:02 majocha

@0101 I put the failwith in and adjusted two tests that acted differently with Transparent Compiler. Let's see if this is still green.

majocha avatar Feb 05 '24 14:02 majocha

Yeah, the only problem is Transparent Compiler triggers that exception also in VS, not just in the test.

System.AggregateException : One or more errors occurred. ---> DiagnosticsAsyncState.DiagnosticsLogger not set. at <StartupCode$FSharp-Compiler-Service>.$AsyncMemoize.catchHandler@1[TKey,TVersion,TValue](AsyncMemoize3 _,KeyData2 key,CachingDiagnosticsLogger cachingLogger,Exception _arg15) at <StartupCode$FSharp-Compiler-Service>[email protected](Exception x) at Microsoft.FSharp.Control.AsyncPrimitives.CallFilterThenInvoke[T](AsyncActivation1 ctxt,FSharpFunc2 filterFunction,ExceptionDispatchInfo edi) at Microsoft.FSharp.Control.Trampoline.Execute(FSharpFunc2 firstAction) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at async Microsoft.VisualStudio.FSharp.Editor.WorkspaceExtensions.CheckerExtensions.FSharpChecker-ParseAndCheckDocumentUsingTransparentCompiler@355(<Unknown Parameters>) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at async Microsoft.VisualStudio.FSharp.Editor.WorkspaceExtensions.CheckerExtensions.FSharpChecker-ParseAndCheckDocument@448(<Unknown Parameters>) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at async Microsoft.VisualStudio.FSharp.Editor.WorkspaceExtensions.Document-GetFSharpParseAndCheckResultsAsync@538(<Unknown Parameters>) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at async StartupCode$FSharp-Editor(<Unknown Parameters>) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at async Microsoft.CodeAnalysis.CodeRefactorings.CodeRefactoringService.<>c__DisplayClass12_0.<GetRefactoringFromProviderAsync>b__0(<Unknown Parameters>) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at async Microsoft.CodeAnalysis.Extensions.IExtensionManagerExtensions.PerformFunctionAsync[T](<Unknown Parameters>) ---> (Inner Exception #0) System.Exception : DiagnosticsAsyncState.DiagnosticsLogger not set. at FSharp.Compiler.DiagnosticsLogger.UninitializedDiagnosticsLogger@347.DiagnosticSink(PhasedDiagnostic diagnostic,FSharpDiagnosticSeverity severity) at Internal.Utilities.Collections.CachingDiagnosticsLogger.DiagnosticSink(PhasedDiagnostic diagnostic,FSharpDiagnosticSeverity severity) at FSharp.Compiler.ParseAndCheckInputs.CheckOneInputWithCallback@1588-13.Invoke(TcState tcState) at <StartupCode$FSharp-Compiler-Service>[email protected](TcInfo tcInfo) at FSharp.Compiler.CodeAnalysis.TransparentCompiler.TypeCheckingGraphProcessing.processTypeCheckingGraph@253-6.Invoke(Tuple2 tupledArg) at Microsoft.FSharp.Collections.ArrayModule.Fold[T,TState](FSharpFunc2 folder,TState state,T[] array) at FSharp.Compiler.CodeAnalysis.TransparentCompiler.TypeCheckingGraphProcessing.processTypeCheckingGraph@246-4.Invoke(Tuple2[] results) at Microsoft.FSharp.Control.AsyncPrimitives.CallThenInvokeNoHijackCheck[a,b](AsyncActivation1 ctxt,b result1,FSharpFunc2 userCode) at Microsoft.FSharp.Control.Trampoline.Execute(FSharpFunc`2 firstAction) <---

It looks like AsyncMemoize is trying to replay the diagnostics in the top level Get, where there is nothing to replay it on.

majocha avatar Feb 05 '24 17:02 majocha

Good catch. This needs to be fixed probably in both places:

  • Don't try to replay diagnostics to a Null Logger (though it would be nice to send this to telemetry somehow)
  • Don't run (request) a computation that produces diagnostics without having set up a logger - wonder where we do that

0101 avatar Feb 05 '24 23:02 0101

  • Don't run (request) a computation that produces diagnostics without having set up a logger - wonder where we do that

There may be a difference in behavior between the old builder and Transparent Compiler. Possibly there are places where the old one sets a global logger and doesn't bother to restore previous one later.

majocha avatar Feb 06 '24 09:02 majocha

I think I may have also overlooked another thing here. NodeCode.FromCancellable and NodeCode.AwaitTask effectively wrapped the computation in a compilation diagnostics scope (restoring the global logger and build phase on finish). Async.AwaitNodeCode did the same thing. There may be subtle changes in behavior here that pass the test suite anyway.

majocha avatar Feb 06 '24 09:02 majocha

  • Don't try to replay diagnostics to a Null Logger (though it would be nice to send this to telemetry somehow)

I included a provisional fix for this in the commit above.

majocha avatar Feb 06 '24 09:02 majocha

There may be a difference in behavior between the old builder and Transparent Compiler. Possibly there are places where the old one sets a global logger and doesn't bother to restore previous one later.

Yeah it's quite different. But there are not that many places where we call AsyncMemoize.Get. So we can still check if some of them can produce diagnostics and there's no logger above.

0101 avatar Feb 06 '24 09:02 0101

OK, with the above changes I have one failure in fsharpqa locally.

  • fail001 SOURCE="MissingEndToken01.fs" # MissingEndToken01.fs

majocha avatar Feb 06 '24 10:02 majocha

Looks like it's still good. I installed the above fix in VS and Transparent Compiler works.

majocha avatar Feb 06 '24 10:02 majocha

Another diagnostics push to null, find usages with Transparent Compiler in VS:

at FSharp.Compiler.DiagnosticsLogger.UninitializedDiagnosticsLogger@347.DiagnosticSink(PhasedDiagnostic diagnostic, FSharpDiagnosticSeverity severity) in E:\repos\fsharp\src\Compiler\Facilities\DiagnosticsLogger.fs:line 349 at FSharp.Compiler.CompilerOptions.reportDeprecatedOption@367-1.Invoke(FSharpOption1 errOpt) in E:\repos\fsharp\src\Compiler\Driver\CompilerOptions.fs:line 369 at [email protected](FSharpList1 l) in E:\repos\fsharp\src\Compiler\Driver\CompilerOptions.fs:line 379 at [email protected](FSharpList1 args) in E:\repos\fsharp\src\Compiler\Driver\CompilerOptions.fs:line 503 at FSharp.Compiler.CompilerOptions.ParseCompilerOptions(FSharpFunc2 collectOtherArgument, FSharpList1 blocks, FSharpList1 args) in E:\repos\fsharp\src\Compiler\Driver\CompilerOptions.fs:line 506 at FSharp.Compiler.CompilerOptions.ApplyCommandLineArgs(TcConfigBuilder tcConfigB, FSharpList1 sourceFiles, FSharpList1 argv) in E:\repos\fsharp\src\Compiler\Driver\CompilerOptions.fs:line 2317

I think there's no DiagnosticsScope here: https://github.com/dotnet/fsharp/blob/3ac064e3f838f66cdf8c2929aac129b981c68a23/src/Compiler/Service/TransparentCompiler.fs#L641-L643

majocha avatar Feb 06 '24 12:02 majocha

Another diagnostics push to null, find usages with Transparent Compiler in VS:

Hmm, it's a question what should we do in that case.

I suppose it would be nice when you search for usages, that triggers a check of some file which produces diagnostics, for them to show up in the IDE. I don't even know if we can do that in current VS extension. But eventually this could go to LSP push diagnostics.

For now we can probably just add a note about it and rely on your fix with discard error logger.

0101 avatar Feb 06 '24 15:02 0101

For now we can probably just add a note about it and rely on your fix with discard error logger.

Hmm. This one doesn't seem to go through the cache.

majocha avatar Feb 06 '24 15:02 majocha

Oh yeah, here where it says we might need to deal with exceptions? 😅 I guess we do...

https://github.com/dotnet/fsharp/blob/3ac064e3f838f66cdf8c2929aac129b981c68a23/src/Compiler/Service/TransparentCompiler.fs#L1801-L1802

0101 avatar Feb 06 '24 16:02 0101

We can do for now:

            // TODO: might need to deal with exceptions here:
            let tcConfigB, sourceFileNames, _ =
                DiagnosticsLogger.suppressErrorReporting <| fun () ->
                    ComputeTcConfigBuilder projectSnapshot

majocha avatar Feb 06 '24 16:02 majocha

Probably these could be just utility functions in DiagnosticsLogger.fs:

  • Async.FromCancellableWithScope
  • Async.CompilationScope

As they have nothing to do with Async in general.

Also, naming is hard.

majocha avatar Feb 08 '24 09:02 majocha

A quick TransparentCompilerBenchmark shows the performance is mostly unchanged.

asynclocal
| Method          | UseTransparentCompiler | ProjectType     | Mean       | Error     | StdDev    | Gen0       | Completed Work Items | Lock Contentions | Gen1      | Allocated |
|---------------- |----------------------- |---------------- |-----------:|----------:|----------:|-----------:|---------------------:|-----------------:|----------:|----------:|
| ExampleWorkflow | False                  | DependencyChain | 3,045.4 ms |  66.11 ms |  10.23 ms | 19000.0000 |             157.0000 |                - | 2000.0000 |   5.62 GB |
| ExampleWorkflow | False                  | DependentGroups | 3,050.4 ms |  43.11 ms |   6.67 ms | 19000.0000 |             150.0000 |                - | 2000.0000 |   5.58 GB |
| ExampleWorkflow | False                  | ParallelGroups  | 3,286.8 ms |  55.72 ms |   8.62 ms | 20000.0000 |             151.0000 |                - | 3000.0000 |   5.79 GB |
| ExampleWorkflow | True                   | DependencyChain | 2,287.5 ms | 190.75 ms |  10.46 ms |  2000.0000 |             862.0000 |                - | 1000.0000 |   2.84 GB |
| ExampleWorkflow | True                   | DependentGroups |   977.4 ms | 704.55 ms | 109.03 ms |  2000.0000 |             824.0000 |                - | 1000.0000 |   2.75 GB |
| ExampleWorkflow | True                   | ParallelGroups  |   909.6 ms | 660.61 ms | 102.23 ms |  2000.0000 |             859.0000 |                - | 1000.0000 |   2.98 GB |

main
| Method          | UseTransparentCompiler | ProjectType     | Mean       | Error    | StdDev    | Gen0       | Completed Work Items | Lock Contentions | Gen1      | Allocated |
|---------------- |----------------------- |---------------- |-----------:|---------:|----------:|-----------:|---------------------:|-----------------:|----------:|----------:|
| ExampleWorkflow | False                  | DependencyChain | 3,496.0 ms | 851.2 ms | 131.73 ms | 19000.0000 |             156.0000 |                - | 2000.0000 |   5.61 GB |
| ExampleWorkflow | False                  | DependentGroups | 3,330.6 ms | 196.5 ms |  30.40 ms | 19000.0000 |             151.0000 |                - | 2000.0000 |   5.58 GB |
| ExampleWorkflow | False                  | ParallelGroups  | 3,379.3 ms | 410.1 ms |  63.46 ms | 20000.0000 |             151.0000 |                - | 4000.0000 |   5.78 GB |
| ExampleWorkflow | True                   | DependencyChain | 2,328.7 ms | 493.9 ms |  27.07 ms |  2000.0000 |             839.0000 |                - | 1000.0000 |   2.84 GB |
| ExampleWorkflow | True                   | DependentGroups |   934.5 ms | 583.5 ms |  90.30 ms |  2000.0000 |             868.0000 |                - | 1000.0000 |   2.74 GB |
| ExampleWorkflow | True                   | ParallelGroups  |   951.4 ms | 799.7 ms | 123.76 ms |  2000.0000 |             901.0000 |                - | 1000.0000 |   2.98 GB |

majocha avatar Feb 10 '24 17:02 majocha

We should probably figure out some more testing scenarios before we merge this.

0101 avatar Feb 12 '24 18:02 0101