msbuild
msbuild copied to clipboard
Build Issue: Failing Unit Tests on macOS ARM64
Issue Description
On macOS with Apple Silicon (ARM64) unit tests fail. The issues starts with the commit for PR #7550 on April 19.
On macOS with Intel the issue doesn't occur which is why the PR checks passed.
Steps to Reproduce
Clone the repo to a host running macOS on an Apple Silicon processor. Run
./build.sh --test
The build will fail. The log shows 8 unit tests are failing in Microsoft.Build.Engine.UnitTests.dll.
Expected Behavior
All unit tests are expected to pass. And if the main branch is reverted to the commit before the PR commit, all unit tests do pass
Actual Behavior
8 unit tests fail in Microsoft.Build.Engine.UnitTests.dll
Analysis
The issues starts with the commit for PR #7550 on April 19.
Versions & Configurations
% dotnet --version
6.0.202
Build with HEAD set to 67deba37
% dotnet ./artifacts/bin/bootstrap/net6.0/MSBuild/MSBuild.dll --version
Microsoft (R) Build Engine version 17.3.0-dev-22225-01+67deba370 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.
17.3.0.22501
Build with HEAD set to cff0b1f2 (current as of creating this issue)
% dotnet ./artifacts/bin/bootstrap/net6.0/MSBuild/MSBuild.dll --version
Microsoft (R) Build Engine version 17.3.0-dev-22226-01+cff0b1f26 for .NET
Copyright (C) Microsoft Corporation. All rights reserved.
17.3.0.22601
Summary of Failed Unit Tests
- Microsoft.Build.Engine.UnitTests.BackEnd.TaskHostFactory_Tests.TaskNodesDieAfterBuild
- Shouldly.ShouldAssertException : False\n should be\nTrue\n but was not
- Microsoft.Build.UnitTests.BackEnd.BuildManager_Tests.TaskInputLoggingIsExposedToTasks(taskFactory: "TaskHostFactory", taskInputLoggingEnabled: False)
- Assert.Equal() Failure\nExpected: Success\nActual: Failure
- Microsoft.Build.UnitTests.BackEnd.BuildManager_Tests.TaskInputLoggingIsExposedToTasks(taskFactory: "TaskHostFactory", taskInputLoggingEnabled: True)
- Assert.Equal() Failure\nExpected: Success\nActual: Failure
- Microsoft.Build.UnitTests.EscapingInProjects_Tests.SimpleScenarios.EscapedWildcardsShouldNotBeExpanded_InTaskHost
- Shouldly.ShouldAssertException : False\n should be\nTrue\n but was not\n\nAdditional Info:\n Build failed. See test output (Attachments in Azure Pipelines) for details
- Microsoft.Build.UnitTests.EscapingInProjects_Tests.SimpleScenarios.ItemTransformContainingSemicolon_InTaskHost
- Assert.True() Failure\nExpected: True\nActual: False
- Microsoft.Build.UnitTests.EscapingInProjects_Tests.SimpleScenarios.SemicolonInPropertyPassedIntoStringParam_UsingTaskHost
- Assert.True() Failure\nExpected: True\nActual: False
- Microsoft.Build.UnitTests.XmakeAttributesTest.TestArchitectureValuesMatch
- Assert.True() Failure\nExpected: True\nActual: False
- Microsoft.Build.UnitTests.XmakeAttributesTest.TestMergeArchitectureValues
- Assert.Equal() Failure\n ↓ (pos 0)\nExpected: x64\nActual: arm64\n ↑ (pos 0)
Better buy me a fancy new Mac so I can get this fixed, boss . . . I've got room on my desk for an Ultra 😜
@jrdodds thanks for the report! There might be more details in the test XML output, which should be in artifacts/TestResults/Debug/Microsoft.Build.Engine.UnitTests_net6.0_x64.xml
(though the x64
part might be different?) after a test run. Would you mind sharing that? I think some of these we could solve without hardware access but some I don't understand by looking at just the failed assertion, so I'd like to have more clues.
Attached a .zip with both the html and xml files. Microsoft.Build.Engine.UnitTests_net6.0_x64.zip
I think comm traces would be useful in figuring this out. Any chance you could grab some from one of the tests, preferably EscapedWildcardsShouldNotBeExpanded_InTaskHost? That would mean setting MSBUILDDEBUGCOMM
to 1 and MSBUILDDEBUGPATH
to somewhere you can find at the start of the test. If you do it with using (TestEnvironment...), make sure to break before it exits so the traces are still there.
@Forgind Yes I can try to capture traces for this issue.
@Forgind I tried to capture traces but it didn't seem to work and I haven't had an opportunity to investigate.
I found the gist Debugging Node Communication in MSBuild but I think I need more detail. Is there another article or tutorial or document that explains how to use MSBUILDDEBUGCOMM
and MSBUILDDEBUGPATH
?
Thanks
@BenVillalobos, that gist could do with some expansion 😉
Just for context, in several methods involved in node communication such as NodeProviderOutOfProcBase.GetNodes or NodeEndpointOutOfProcBase.PacketPumpProc, we call CommunicationUtilities.Trace. That method checks MSBUILDDEBUGCOMM (as cached in Traits) and chooses to log if it is 1. It logs it to MSBUILDDEBUGPATH if it is set and otherwise follows a series of fallbacks. Easier to avoid that, so it's useful to set MSBUILDDEBUGPATH and make sure it's somewhere you can write to.
The best structure for temporarily setting an environment variable in our unit tests is with a TestEnvironment, which looks like this:
using (TestEnvironment env = TestEnvironment.Create());
env.SetEnvironmentVariable(name, value);
...
This automatically unsets the environment variable after the test. One potential issue is that Traits are initialized early: when the process starts. Fortunately, we have an exception in tests where it makes a new Traits object every time it's accessed.
I mentioned that TestEnvironments unset the environment variable. They also unset other state. That includes that (I think) they would delete the trace files after it exits scope. For that reason, if this were me, I'd break on one of the last things in a test (some ShouldBe-type statement) and look at the path you'd set with MSBUILDDEBUGPATH. There should be a number of traces there.
Does that help?
Is this still an issue you're hitting? Were you ever able to retrieve logs?
I just retested with ./build.sh -test
and this is still an issue.
I will work on retrieving the logs. I haven't retried with the more detailed instructions.
@AR-May accidentally told me that it was possible to test in this environment, so assigning over :)
I saw PR #8073 had been merged. I tested and can confirm that two tests that were failing are now passing: Microsoft.Build.UnitTests.XmakeAttributesTest.TestArchitectureValuesMatch
and Microsoft.Build.UnitTests.XmakeAttributesTest.TestMergeArchitectureValues
.
I also see a new failing test Microsoft.Build.Engine.UnitTests.BackEnd.TaskHostFactory_Tests.VariousParameterTypesCanBeTransmittedToAndRecievedFromTaskHost
which reports "Shouldly.ShouldAssertException : False\n should be\nTrue\n but was not"
That test uses a task host explicitly, which likely indicates the code for starting up or (more likely?) connecting to a task host doesn't work on arm. I suspect there's something that explicitly looks for an x86/x64 process and doesn't find it or advertises itself as an x86/x64 process when it isn't.
Yes, I have the same idea that something is wrong when we try to connect to the task host, indeed. Unfortunately, I cannot see the issue in the code at this moment and I was not able to debug it step-by-step yet. I am trying to get there.
Also, btw, there are some tests failing in windows for me as well. It seems like behavior in CI and behavior on my windows machine are different.
That test uses a task host explicitly, which likely indicates the code for starting up or (more likely?) connecting to a task host doesn't work on arm.
I suspect it would be closer to the logic of actually transmitting the data
Some places that might be relevant: https://github.com/dotnet/msbuild/blob/main/src/Shared/TaskHostConfiguration.cs https://github.com/dotnet/msbuild/blob/main/src/MSBuild/OutOfProcTaskHostNode.cs
Something went wrong during my attempt to set up the debugging and now msbuild repo fails to build on my machine. I had to switch to other issues, unfortunately. I might return to this issue later, when I have more time to fix builds on my arm64 machine.
Retested locally and confirmed all unit tests complete successfully.