msbuild
msbuild copied to clipboard
SdkResolverService throws "System.InvalidOperationException: Collection was modified" occasionally
I'm occasionally seeing the following exception:
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.GetResolvers(IList`1 resolversManifests, LoggingContext loggingContext, ElementLocation sdkReferenceLocation)
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdkUsingResolversWithPatternsFirst(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio)
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio)
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.<>n__0(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio)
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.<>c__DisplayClass3_0.<ResolveSdk>b__2()
at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)
--- End of stack trace from previous location ---
at System.Lazy`1.CreateValue()
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio)
at Microsoft.Build.Evaluation.Evaluator`4.ExpandAndLoadImportsFromUnescapedImportExpressionConditioned(String directoryOfImportingFile, ProjectImportElement importElement, List`1& projects, SdkResult& sdkResult, Boolean throwOnFileNotExistsError)
at Microsoft.Build.Evaluation.Evaluator`4.ExpandAndLoadImports(String directoryOfImportingFile, ProjectImportElement importElement, SdkResult& sdkResult)
at Microsoft.Build.Evaluation.Evaluator`4.EvaluateImportElement(String directoryOfImportingFile, ProjectImportElement importElement)
at Microsoft.Build.Evaluation.Evaluator`4.PerformDepthFirstPass(ProjectRootElement currentProjectOrImport)
at Microsoft.Build.Evaluation.Evaluator`4.EvaluateImportElement(String directoryOfImportingFile, ProjectImportElement importElement)
at Microsoft.Build.Evaluation.Evaluator`4.PerformDepthFirstPass(ProjectRootElement currentProjectOrImport)
at Microsoft.Build.Evaluation.Evaluator`4.Evaluate()
at Microsoft.Build.Evaluation.Project.ProjectImpl.Reevaluate(ILoggingService loggingServiceForEvaluation, ProjectLoadSettings loadSettings, EvaluationContext evaluationContext)
at Microsoft.Build.Evaluation.Project.ProjectImpl.ReevaluateIfNecessary(ILoggingService loggingServiceForEvaluation, ProjectLoadSettings loadSettings, EvaluationContext evaluationContext)
at Microsoft.Build.Evaluation.Project.ProjectImpl.ReevaluateIfNecessary(ILoggingService loggingServiceForEvaluation, EvaluationContext evaluationContext)
at Microsoft.Build.Evaluation.Project.ProjectImpl.ReevaluateIfNecessary(EvaluationContext evaluationContext)
at Microsoft.Build.Evaluation.Project.ProjectImpl.Initialize(IDictionary`2 globalProperties, String toolsVersion, String subToolsetVersion, ProjectLoadSettings loadSettings, EvaluationContext evaluationContext)
at Microsoft.Build.Evaluation.Project..ctor(ProjectRootElement xml, IDictionary`2 globalProperties, String toolsVersion, String subToolsetVersion, ProjectCollection projectCollection, ProjectLoadSettings loadSettings, EvaluationContext evaluationContext, IDirectoryCacheFactory directoryCacheFactory)
at Microsoft.Build.Evaluation.Project.FromProjectRootElement(ProjectRootElement rootElement, ProjectOptions options)
Seems like there is a race condition in SdkResolverService.
MSBuild version: 17.4.0-preview-22416-02 (pretty close to head of main as of this writing)
FYI @dsplaisted
My guess based on inspection the code is that this is the race:
- Multiple threads get to
ResolveSdkUsingResolversWithPatternsFirstat the same time. - Both thread 1 and thread 2 see
_generalResolversManifestsRegistryis null and enterRegisterResolversManifests - Thread 1 populates
_generalResolversManifestsRegistryinsideRegisterResolversManifests. Note that this is under a lock. - Thread 1 returns from
RegisterResolversManifests - Thread 2 obtains the lock and creates
_generalResolversManifestsRegistryagain and starts to add to it. - Thread 1 enteres the 2nd
GetResolverscall which passes_generalResolversManifestsRegistry. Note! This is not under a lock! - Thread 1 in
GetResolversstarts to iterate the list passed in (_generalResolversManifestsRegistry) - Thread 2 adds to
_generalResolversManifestsRegistry - Thread 1 continues to iterate the original
_generalResolversManifestsRegistryand throws.
So ultimately, the bug is that _generalResolversManifestsRegistry is only locked when writing, but not when reading.
Code with the bug was introduced in #7597 by @AR-May
@dfederm thank you for reporting the bug and for analysis. The scenario above should not be possible. The _generalResolversManifestsRegistry is double-checked on creation: the RegisterResolversManifests has a check after a lock to prevent the second creation. So, thread 2 should just return from the function without rewriting the collection. After creation this collection was not supposed to be modified, so I thought it should be ok not to lock it on reading, since it happens only after a creation. I would dig into this.
I saw this on MSBuild version = "17.8.3+195e7f5a3" today
error MSB4014: The build stopped unexpectedly because of an internal failure.
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdkUsingResolversWithPatternsFirst(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.<>c__DisplayClass3_0.<ResolveSdk>b__2()
at System.Lazy`1.CreateValue()
at System.Lazy`1.LazyInitValue()
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.MainNodeSdkResolverService.PacketReceived(Int32 node, INodePacket packet)
and
error MSB4014: The build stopped unexpectedly because of an internal failure.
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdkUsingResolversWithPatternsFirst(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.<>c__DisplayClass3_0.<ResolveSdk>b__2()
at System.Lazy`1.CreateValue()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Lazy`1.get_Value()
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.MainNodeSdkResolverService.PacketReceived(Int32 node, INodePacket packet)
at System.Lazy`1.get_Value()
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.MainNodeSdkResolverService.PacketReceived(Int32 node, INodePacket packet)
From the source code, it looks like this could happen:
- on thread A, ResolveSdkUsingResolversWithPatternsFirst checks_generalResolversManifestsRegistry == null, gets true, and calls RegisterResolversManifests
- on thread A, RegisterResolversManifests locks _lockObject
- on thread A, RegisterResolversManifests checks _generalResolversManifestsRegistry != null, gets false, and proceeds
- on thread A, RegisterResolversManifests does other stuff and then assigns _generalResolversManifestsRegistry = new List<SdkResolverManifest>();
- on thread B, ResolveSdkUsingResolversWithPatternsFirst checks_generalResolversManifestsRegistry == null, and does not call RegisterResolversManifests, and does not lock _lockObject
- on thread B, ResolveSdkUsingResolversWithPatternsFirst does other stuff and then calls GetResolvers(_generalResolversManifestsRegistry, loggingContext, sdkReferenceLocation)
- on thread B, GetResolvers enters the foreach (var resolverManifest in resolversManifests) loop and gets an enumerator
- on thread A, RegisterResolversManifests calls _generalResolversManifestsRegistry.Add(manifest), which invalidates the enumerator
- on thread B, GetResolvers attempts to use the enumerator, which is now invalid and throws
In this scenario, the problem is that, although RegisterResolversManifests locks _lockObject to prevent other threads from initializing the manifest registries in parallel, it assigns the list to _generalResolversManifestsRegistry before it has finished adding the manifests to it. Other threads can then read the list reference from _generalResolversManifestsRegistry, assume that the list will no longer be modified, and attempt to enumerate it. To fix this, RegisterResolversManifests should store the lists to local variables first, populate them there, and assign to fields only just before unlocking _lockObject. I'm not sure whether these assignments would need to be volatile according to the memory model.
just a fyi, I ran into this again on a build today