Buildalyzer
Buildalyzer copied to clipboard
Number of syntax trees in project sometimes zero
I am often and unpredictably finding no syntax trees in a project. The solution has 10 projects and on around half the executions, 2 or 3 of the projects (different each time) will return no classes.
public static async Task<string> ParseInfo(string solutionFile)
{
var manager = new AnalyzerManager(solutionFile);
var workspace = manager.GetWorkspace();
foreach (var project in workspace
.CurrentSolution
.Projects)
{
var compilation = await project.GetCompilationAsync();
var count = 0;
foreach (var tree in compilation.SyntaxTrees)
{
count +=
tree.GetRoot()
.DescendantNodes()
.OfType<ClassDeclarationSyntax>()
.Count();
}
Console.WriteLine($"Project: {project.Name} Class Count: {count}");
}
}
An example sequence of analyses might find this many total classes depending on which projects failed to produce any syntax trees:
Run | Class count |
---|---|
0 | 854 |
1 | 854 |
2 | 854 |
3 | 854 |
4 | 727 |
5 | 831 |
6 | 693 |
7 | 854 |
8 | 854 |
9 | 853 |
10 | 840 |
11 | 854 |
12 | 854 |
13 | 854 |
14 | 854 |
15 | 831 |
16 | 854 |
17 | 854 |
18 | 854 |
19 | 854 |
- BuildAlyzer 2.6.0
- Ubuntu 18.04
- .NET Core SDK (3.1.202)
Ok looks like the same thread safety issue as #134. It's fixed by disabling parallel builds.
I have identified one issue so far. AnalyzerResults
is updated from multiple threads and has no concurrency support. I have updated it from Dictionary
to ConcurrentDictionary
but this has not solved all issues.
@daveaglick
The cause is that AnalyzerResult.ProcessCscCommandLine
is never called because EventProcessor.MessageRaised
is never called.
I believe there is a race condition between the AnonymousPipeLoggerServer
reading from the buffer and it's CancellationTokenSource
being triggered when the build process exits. I think that the buffer reading is cancelled before it has completed.
It appears that it isn't actually necessary to trigger CancellationTokenSource
at all. The buffer reading will terminate on it's own when the pipe is closed so the Process.Exited
event can just be removed.
private IAnalyzerResults BuildTargets(BuildEnvironment buildEnvironment, string targetFramework, string[] targetsToBuild, AnalyzerResults results)
{
using (CancellationTokenSource cancellation = new CancellationTokenSource())
{
using (AnonymousPipeLoggerServer pipeLogger = new AnonymousPipeLoggerServer(cancellation.Token))
{
using (EventProcessor eventProcessor = new EventProcessor(Manager, this, BuildLoggers, pipeLogger, results != null))
{
// Run MSBuild
int exitCode;
string fileName = GetCommand(buildEnvironment, targetFramework, targetsToBuild, pipeLogger.GetClientHandle(), out string arguments);
using (ProcessRunner processRunner = new ProcessRunner(fileName, arguments, Path.GetDirectoryName(ProjectFile.Path), GetEffectiveEnvironmentVariables(buildEnvironment), Manager.LoggerFactory))
{
processRunner.Exited = () => cancellation.Cancel(); // <--- RACE CONDITION
processRunner.Start();
pipeLogger.ReadAll(); // <----
processRunner.WaitForExit();
exitCode = processRunner.ExitCode;
}
// Collect the results
results?.Add(eventProcessor.Results, exitCode == 0 && eventProcessor.OverallSuccess);
}
}
}
return results;
}
When I comment out the cancellation, I now see reliable results.
When I comment out the cancellation, I now see reliable results.
Awesome! Thanks for tracking this down to a root cause - when I initially tried to reproduce I couldn't seem to get the same results so I put it on hold while working on other stuff and intended to come back later. Glad you beat me to it :)
I'll try my best to apply this as a fix and get out a new version this weekend or early next week.
Thanks @daveaglick
i) also see #141 for a fix for that one.
ii) Changing AnalyzerResults
to use a ConcurrentDictionary
might fix the missing projects seen in #134.
I initially tried to reproduce I couldn't seem to get the same results
iii) To help replicate this, I generated a liltte script to create big solutions with lots of tiny projects and I was able to see lots of failures. I only need a dozen or so projects:
var sln = $"bigtest";
Console.WriteLine($"dotnet new sln -n {sln}");
for (int i = 0; i < 50; i++)
{
Console.WriteLine($"dotnet new console -n Proj{i}");
Console.WriteLine($"dotnet sln ./{sln}.sln add ./Proj{i}/Proj{i}.csproj");
}
I'm going to add a test solution using your script to the test suite and it will be awesome :)
This test is now consistently passing for me with a 50 project solution:
Hopefully that's an indication this is resolved. Thanks again for the fantastic work narrowing down the multiple concurrency issues here @duncanawoods.
Buildalyzer 3.0.1 is now rolling out to NuGet and should (hopefully) resolve this issue. When you get a chance can you please verify that this problem is resolved? Thanks!
Hi @daveaglick
I have had a chance to try it out. When noodling around, I was seeing it around 5% of the time. I have also seen some cases where a typically 15s operation took 10 minutes to complete.
I created a batch test with better data collection to share some hard stats with you but this didn't even appear (!) and I saw some different errors.
Test setup
- a process that ran an analysis of a 14 project solution with 863 classes
- Ubuntu 18.04
- run 20 processes at a time
- repeat 20 times
Result | Number | % |
---|---|---|
Success | 385 | 96.25% |
Could not find build environment | 15 | 3.75% |
Missing classes | 0 | 0 |
Observations
- I don't know what environmental conditions might increase/decrease the likelihood of this specific issue
- Even though they are separate processes, the tests within a batch do not seem entirely independent. The "Could not find build environment" appeared in groups of 5 or so.
- Trying to do 100 process batches led to lots of errors about "closed pipes". Must be some system resource thing but I am surprised on a 12C/24T, 64GB RAM
Conclusions
I think that while we have improved things, there are still gremlins in the concurrency code and given their rarity we are going to have a terrible time replicating let alone fixing them.
In this sort of situation my approach would be to take a clean-slate approach the concurrency design. Right now pretty much every form of parallelism is used: sub-processes, threads, async/await, TPL loops and concurrent data-structures. One concern is that peppering code with concurrent data-structures is insufficient for operations that touch multiple data-structures. What needs to happen is ensure that entire operations are isolated from each other instead of just fine-grained access to specific data. To make concurrency comprehensible and checkable for errors, I would aim to have the majority of code unaware of concurrency concerns and push all the orchestration up to the top level with crystal clear guarantees about who is accessing what, when things are started/terminated.