msbuild
msbuild copied to clipboard
Building solution with forced out-of-proc nodes fails
📝 I'm not sure we're going to be very motivated to fix this, but it cost me a lot of time debugging bootstrapped-build failures in #3365, so I'm filing it.
Steps to reproduce
Build a solution using the MSBuild task
<Project>
<Target Name="BuildSln">
<MSBuild Projects="some.sln" BuildInParallel="true" />
</Target>
</Project>
While using MSBUILDNOINPROCNODE
, this results in an error
S:\msbuild\some.sln(2,1): error MSB4025: The project file could not be loaded. Data at the root level is invalid. Line 2, position 1.
Why?
Solution metaprojects are built based on an in-memory instance which isn't transferable. See
https://github.com/Microsoft/msbuild/blob/2a012e653766eb261b09b29b3106f4eb57a7f61d/src/Build/BackEnd/Components/Scheduler/Scheduler.cs#L1254-L1260
So when forced to out-of-proc nodes, the solution build fails.
Environment data
msbuild /version
output: 15.8.139-preview+g5951330944 for .NET Framework
(also on Core)
Late-joining this thread just to give an additional headsup: Getting the MSB4025 error on your "head-less" build servers doesn't necessarily mean that you are having issues with
MSBUILDNOINPROCNODE=1
There's yet another, far simpler reason due to which this exact error on the .sln might occur:
Your .sln file might have indeed become malformed due to a bad merge on it. Check this out for a description of the problem and a solution:
https://stackoverflow.com/a/52932050/863651
Just my 2c.
I just hit this, and it looks like you don't have to use the MSBuild
task. I get it when I run msbuild /restore /nr:false
in a folder with a .sln file and I have MSBUILDNOINPROCNODE
set to 1.
Setting MSBUILD_PROJECTINSTANCE_TRANSLATION_MODE
to full
works around this issue as it makes sure that the ProjectInstance is correctly moved to the out-of-proc node. And it looks like a lead to a reasonable fix as well.
- Make
ProjectInstance.TranslateEntireState
return true for meta projects. - Remove the assert.
@rokonec,
I tried setting MSBUILDNOINPROCNODE=1 and running build.cmd, and it failed with a recent MSBuild. @BenVillalobos confirmed he had the same result, so I think this is not fully resolved, unfortunately.
@cdmihai says this reproduces in 16.11 but not 17.0: https://github.com/dotnet/msbuild/issues/6818#issuecomment-914542505
Do we know what fixed it?
It gets weirder and weirder. When I set the translation mode env var to full, I get... an OOM from the node process??
16.11.0+0538acc04
That wasn't my experience. It failed with 17.0:
I would guess that #6385 made this substantially better for most cases (or maybe just empty .slns? .slns with 0-1 projects?) but not all cases.
It should be using
I can reproduce this issue on VS2019 16.11.3 (MSBuild 16.11.0+0538acc04) with a recent update to MSBuildProjectCreation as discovered during analysis of https://github.com/jeffkl/MSBuildProjectCreator/issues/128
Hmm, I can still repro this on 17.5.1+f6fdcf537 for .NET Framework, by following the exact repro steps at the top.
I thought Roman's change should have fixed it? Is there another bug lurking here?
From what I currently understand, the solution metaproject information is not properly transferred between nodes and possibly cannot be transferred between nodes. @mattaquinn commented in https://github.com/dotnet/msbuild/issues/8184#issuecomment-1339629311 that adding MSBUILDEMITSOLUTION=1 resolved the problem but also leaves extra solution metaproject files lying around, as should be expected with that escape hatch.
As I recall, when we try to load the solution metaproject file in a node, we first check whether it's in our in-memory cache, and if that fails, we go to disk. (Even metaprojects not written to disk have a pretend path.) My hunch is that if we have an in-proc node, with rokonec's change, we translate the full state and send it to all the other nodes. If not, there's no node to do that properly, since there's no central scheduler node. Writing it to disk via MSBUILDEMITSOLUTION means that although our in-memory cache check still fails, we actually find it on disk, so we can load it in each worker node and use it.
The ideal solution from a good engineering perspective might be to focus on the translation. If the solution metaproject is not created, parse the solution file properly and send that information to the other worker nodes, but there's no node that would be the clear one to take priority and do that.
I propose having MSBUILDNOINPROCNODE implicitly turn on MSBUILDEMITSOLUTION, but, at the end of the build, we then make sure to clean up, i.e., delete the emitted metaproject again.
What do people think of that as a solution for this? Note that it's been open since 2018, so I'm not overly optimistic about a complex fix actually being resolved 😉
If you set MSBUILDEMITSOLUTION implicitly, then would it be feasible to emit the metaproject files to a temporary directory rather than the solution directory? To prevent conflicts in parallel MSBuild invocations, or errors if the directory is not writable. OTOH, perhaps that would cause MSBuild to look for solution customisations in the wrong directory.
I don't think we should have to worry about parallel MSBuild invocations, as this is for building .sln files—the sln itself isn't mutlitargeted, and if you try to build it multiple times in parallel, we may have intermediate outputs that overlap and lead to contention anyway. What are the solution customizations you're thinking of? Something related to the project cache?
I meant building a solution with a build directory tree that is separate from the source directory tree, which could then be read-only. This would likely require the caller of MSBuild to set properties to specify the directory paths. Such builds done in parallel could involve different target platforms or other compiler options.
With "solution customizations", I meant files such as Directory.Solution.props and Directory.Solution.targets, which should still be read from the directory of the solution file even if the generated metaproj files are instead saved to a temporary directory.
Hmm...that is an excellent point. MSBuild looks for quite a few things right next to it and uses relative paths for more, so moving it to another directory would be tricky, but I can see why writing an actual file to the source directory might fail.
We could theoretically try to copy everything into a temp directory, run the build, and copy everything back, but I don't like that solution at all. Doing a partial copy feels like it might be a bug farm to me. I'd like to emit the metaproj to a temp directory then tell MSBuild it's in the real directory, but that would defeat the point of really emitting it in the first place. It sounds like there are real, functional reasons to implement a full cross-node serialization scheme, but I'm not sure that's happening any time soon.
I'm tempted to go back to my original proposal and just accept that there will be some cases in which it will still fail, but it would be nice to know how common those cases are, and I truly have no idea on that point.