sdk icon indicating copy to clipboard operation
sdk copied to clipboard

Build fails due to locked file which it locks itself (concurrent build problem)

Open llehn opened this issue 5 years ago • 36 comments

We moved from .net core 2.0 to 2.1. In this move, we retargeted several projects which were from netstandard2.0 to multitarget netcoreapp2.1 and netstandard2.0. Now we sometimes experience failed builds, both on CI servers and dev machines.

Q: I'd like to provide as much info as I can, so I could try collecting diag build logs and see if I can get one from failed build, would that help?

Steps to reproduce

No stable repro. Start a build using: dotnet publish -c Release -r win-x64 -f netcoreapp2.1

Expected behavior

Build works

Actual behavior

Sometimes the build stops with something like this:

build 11-Jul-2018 22:23:55 C:\Program Files\dotnet\sdk\2.1.301\Microsoft.Common.CurrentVersion.targets(4364,5): error MSB3371: The file "D:\Atlassian\Bamboo\xml-data\build-dir\TIC-AC-JOB1\test\Integration.Kardex\obj\Release\netcoreapp2.1\win-x64\Integration.Kardex.csproj.CopyComplete" cannot be created. The process cannot access the file 'D:\Atlassian\Bamboo\xml-data\build-dir\TIC-AC-JOB1\test\Integration.Kardex\obj\Release\netcoreapp2.1\win-x64\Integration.Kardex.csproj.CopyComplete' because it is being used by another process. [D:\Atlassian\Bamboo\xml-data\build-dir\TIC-AC-JOB1\test\Integration.Kardex\Integration.Kardex.csproj]

Environment data

dotnet --info output:

.NET Core SDK (reflecting any global.json): Version: 2.1.301 Commit: 59524873d6

Runtime Environment: OS Name: Windows OS Version: 6.3.9600 OS Platform: Windows RID: win81-x64 Base Path: C:\Program Files\dotnet\sdk\2.1.301\

Host (useful for support): Version: 2.1.1 Commit: 6985b9f684

.NET Core SDKs installed: 1.0.4 [C:\Program Files\dotnet\sdk] 1.1.0 [C:\Program Files\dotnet\sdk] 2.0.0 [C:\Program Files\dotnet\sdk] 2.0.2 [C:\Program Files\dotnet\sdk] 2.0.3 [C:\Program Files\dotnet\sdk] 2.1.101 [C:\Program Files\dotnet\sdk] 2.1.201 [C:\Program Files\dotnet\sdk] 2.1.301 [C:\Program Files\dotnet\sdk]

.NET Core runtimes installed: Microsoft.AspNetCore.All 2.1.1 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All] Microsoft.AspNetCore.App 2.1.1 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 1.0.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 1.1.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.0.3 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.0.6 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.0.7 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 2.1.1 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs: https://aka.ms/dotnet-download

llehn avatar Jul 11 '18 21:07 llehn

maybe @rainersigwald can help here.

In the meantime, it would really help if you'd be able to get a build log (-bl parameter will create an msbuild.binlog file). This file will also contain the contents of your project files (.csproj - not(!) source code / file contents) so you may want to only mail it to Microsoft employees or trustworthy individuals who are willing to help if these contain any secret infos. Do you have VS / any other form of tooling open during the publish operation? (haven't heard this was a problem but just checking)

dasMulli avatar Jul 11 '18 23:07 dasMulli

Yes, a binary log (or a diagnostic log) would be helpful here. It would probably be helpful even from a build that did not fail with this problem--I suspect that there is a race condition in your build such that some projects are building twice and sometimes they're concurrent.

Also good to know: when it occurs, is the error always in the same project (Integration.Kardex.csproj)? Or is it sometimes in a different project? And is it always that .CopyComplete file, or sometimes something else?

rainersigwald avatar Jul 12 '18 14:07 rainersigwald

msbuild.zip

Here's the binlog with failure. So far the problem seems to occur only with Integration.Kardex.csproj, but with different files (In the attached log it is Integration.Kardex.csprojAssemblyReference.cache). On CI machine the error happens much more frequently (50% of times) than on my dev laptop.

There is no other tooling running, the CI machine even has no VS installed.

llehn avatar Jul 14 '18 11:07 llehn

I guess I'll be going back to having one project collecting all the dependencies and copying them to common folder. Why oh why is msbuild so dumb about these things?

llehn avatar Jul 16 '18 22:07 llehn

From that log:

547257:Project "D:\Atlassian\Bamboo\xml-data\build-dir\TIC-CD106-LIN\Tic.sln" (1) is building "D:\Atlassian\Bamboo\xml-data\build-dir\TIC-CD106-LIN\test\Integration.Kardex\Integration.Kardex.csproj" (35) on node 1 (Publish target(s)).
637706:Project "D:\Atlassian\Bamboo\xml-data\build-dir\TIC-CD106-LIN\test\Integration.Kardex.Wamp.AspNetCore\Integration.Kardex.Wamp.AspNetCore.csproj" (37) is building "D:\Atlassian\Bamboo\xml-data\build-dir\TIC-CD106-LIN\test\Integration.Kardex\Integration.Kardex.csproj" (35:2) on node 5 (GetTargetFrameworks target(s)).
3075143:Project "D:\Atlassian\Bamboo\xml-data\build-dir\TIC-CD106-LIN\test\Integration.Kardex.Wamp.AspNetCore\Integration.Kardex.Wamp.AspNetCore.csproj" (37) is building "D:\Atlassian\Bamboo\xml-data\build-dir\TIC-CD106-LIN\test\Integration.Kardex\Integration.Kardex.csproj" (35:3) on node 2 (default targets).

That indicates that Integration.Kardex.csproj is building with 3 separate sets of global properties. The call to GetTargetFrameworks should be harmless--it should only return information about the project, not do any actual work with files on disk. That leaves the solution reference and the ProjectReference from Integration.Kardex.Wamp.AspNetCore.csproj.

image

As shown in that image, both of the racing instances of the project actually have the same TF and RID, which is where I expected to see the global property change. Unfortunately, the binary log wasn't captured with an MSBuild that has https://github.com/Microsoft/msbuild/pull/3252 (fixed in 15.8 that's not yet released), so it's pretty hard to figure out what properties are different.

Based on testing on a simplified project (one mvc project referencing one netcoreapp2.1 classlib), I think there's a race between the project as built by the solution (inheriting the RID and TF passed to the solution by dotnet publish) and the project as built by the reference (with explicitly unset RID and TF, because there's only one in the file).

But that doesn't seem to be what's happening here, because the query-TFs call didn't collapse into the build-single-TF call. I can't tell why from the logging; I'm getting lost in the various different properties set in different projects.

There's definitely a problem here. At the moment I would suggest running publish only on individual projects and not at the solution level. That will help ensure that TargetFramework and RID flow correctly down to referenced projects.

rainersigwald avatar Jul 17 '18 16:07 rainersigwald

This is the second issue about users getting into problems with dotnet publish on solutions in a about a week.. (the other one was that all libs including the test projects with test host land in the same output dir when using -o).

Does it seem reasonable if dotnet publish tries to determine if it is running on a solution and emit a warning? cc @KathleenDollard @nguerrera @livarcocc @dsplaisted @wli3 @peterhuene

the set of scenarios where dotnet publish on a solution is actually a good idea is pretty limited (e.g. only applications in the solution and using the default, per-project publish directory).

dasMulli avatar Jul 17 '18 16:07 dasMulli

(I continue to be amazed by the work @rainersigwald puts into explaining build issues. now its full-blown diagrams, I still remember some hand-drawn diagrams about RID conflicts ^^)

dasMulli avatar Jul 17 '18 16:07 dasMulli

I believe the problem with slns was first clearly identified with those famous diagrams. :) Yes, we should warn. In 3.0, we should either disallow it or actually make it work somehow.

nguerrera avatar Jul 17 '18 17:07 nguerrera

Thanks for suggestions! Is there somewhere this unreleased version? I could try that to see if I can get better logs for you to find the cause.

And I may add that, from an outside perspective, this sounds reeeally scary.

After being told in the past that you can't ship what "dotnet build" produces, now looking at "dotnet publish" not working for solutions I see just lots of question marks. I'm building a product, which consists of many (C#) projects, having to write a nontrivial script which would publish all the projects, while still using my 16 cores sounds like... writing what a modern build system should do flawlessly.

Maybe it's time to rearchitect msbuild to make it smart. Something like bazel. ;)

llehn avatar Jul 17 '18 20:07 llehn

@rainersigwald the problem started appearing during normal build when specifying -f netcoreapp2.1 So just dotnet build -f netcoreapp2.1 sometimes (1 out of 5-10 times) fails. No publish involved.

2 Logs attached, 1 failed, 1 successful.

Can I get the 15.8 version somewhere so I can get better logs?

msbuild-fail1.zip msbuild-success.zip

llehn avatar Jul 29 '18 00:07 llehn

I second the vote for "make it work somehow". Telling people they have to manually build every project separately is ridiculous. People expect to just build a solution and for the build engine to work.

jez9999 avatar Jan 17 '19 11:01 jez9999

So.. Is there a solution to this? It disrupts our automated build processes..

reto-hal9k avatar Apr 01 '19 13:04 reto-hal9k

Any updates on this ?

mayurjurani avatar May 07 '19 00:05 mayurjurani

I have encountered this issue in our build/CI environment, but I encountered it when upgrading the VM doing the builds to 2 CPU cores rather than 1. Since having working builds is generally more useful than faster builds, I've worked around this by downgrading the VM back to a single core, so it doesn't try to build projects in parallel... so that's one "solution" at least where you have that much control over your build environment. I never get this problem on my local machine whilst building for debug/test, but I never try to publish locally either.

The builds I have trouble with are being run by TeamCity's ".NET CLI (dotnet)" build step, building .NET core 2.2 solutions.

Having read this, I may attempt to change this to build projects rather than solutions, but this seems a bit painful to have to do...

A proper fix would be appreciated.

Moobylicious avatar Aug 14 '19 09:08 Moobylicious

So, i can repro this parallel build issue extremely simple, and publish is not needed, but dotnet restore is enough:

Reproduction:

  • Create a new solution
  • Add an ASP.NET core App
  • Add a .NET Standrd 2.0 library
  • Run dotnet restore TheNameYouChose.sln
  • run psh> Get-Process -Name "dotnet"
  • See leftover dotnet process

Probably works with different setups equally easy. Individual csproj builds work just fine, of course

λ  dotnet --info
.NET Core SDK (reflecting any global.json):
 Version:   2.2.402
 Commit:    c7f2f96116

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.18362
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\2.2.402\

Host (useful for support):
  Version: 3.0.0-preview8-28405-07
  Commit:  d01b2fb7bc

.NET Core SDKs installed:
  2.1.500 [C:\Program Files\dotnet\sdk]
  2.1.801 [C:\Program Files\dotnet\sdk]
  2.2.402 [C:\Program Files\dotnet\sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.1.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.All 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.All 2.2.7 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.1.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 2.2.7 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 3.0.0-preview8.19405.7 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.1.6 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.1.12 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.2.7 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 3.0.0-preview8-28405-07 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.WindowsDesktop.App 3.0.0-preview8-28405-07 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

Structed avatar Sep 20 '19 15:09 Structed

@Structed The leftover process is expected. You can shut down such processes with dotnet build-server shutdown.

If the leftover process is locking files causing subsequent builds to fail, then that is an issue.

dsplaisted avatar Sep 20 '19 16:09 dsplaisted

Yes, exactly this is the case: files are being locked. Can't currently provide logs/repro project. If you need, let me know and I can provide them Monday.

Am 20. September 2019 18:47:52 schrieb Daniel Plaisted [email protected]:

@Structed The leftover process is expected. You can shut down such processes with dotnet build-server shutdown. If the leftover process is locking files causing subsequent builds to fail, then that is an issue. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Structed avatar Sep 20 '19 16:09 Structed

@Structed Yes, please provide the logs and repro project if possible.

dsplaisted avatar Sep 30 '19 15:09 dsplaisted

Repo: https://github.com/burnasheva/mstest_dotnet3.git

Cmd: dotnet.exe build MSTestCore.sln --framework netcoreapp3.0

dotnet --info

Error:

CSC : error CS2012: Cannot open 'C:\BuildAgentJDK_Branch\work\bd8628495f4400ff\PrimeService\obj\Debug\netcoreapp3.0\PrimeService.dll' for writing -- 'The process cannot access the file 'C:\BuildAgentJDK_Branch\work\bd8628495f4400ff\PrimeService\obj\Debug\netcoreapp3.0\PrimeService.dll' because it is being used by another process.'

NikolayPianikov avatar Oct 09 '19 10:10 NikolayPianikov

As a sidenode: @rainersigwald mentioned that references were built multiple times. We had the exact same issue and found that one of out net core project files had the TargetFramework tag instead of TargetFrameworks. Once we changed it in the .csproj to TargetFrameworks- even though it was only one target framework- the extra builds of the same project and the resulting errors were gone.

smaktacular avatar Oct 16 '19 10:10 smaktacular

@smaktacular's reccomendation works to fix the build of the actual project, but (for me) breaks the build of the related Docker/docker-compose.

Error for the project containing the Dockerfile:

error MSB4057: The target "DockerResolveAppType" does not exist in the project.

Workaround Change all projects in the solution to TargetFrameworks except for the one containing the Dockerfile, which should retain TargetFramework

msdeibel avatar Nov 28 '19 10:11 msdeibel

Has anything ever come out of this? I have a group of about 200 assemblies. Running DOTNET on each project is WAY too slow so I put them into a solution and it takes minutes (as opposed to hours individually). But I get a lot of these Copy/retries and sometimes they run out of retries and the build fails.

If I could pass in a list of projects (like when building the solution) that may be helpful, even if have to list them in build order. Possibly tell it to ignore locked files (because these files packages ARE there),

CWoodruffRES avatar Sep 16 '20 06:09 CWoodruffRES

Has anything ever come out of this? I have a group of about 200 assemblies. Running DOTNET on each project is WAY too slow so I put them into a solution and it takes minutes (as opposed to hours individually). But I get a lot of these Copy/retries and sometimes they run out of retries and the build fails.

If I could pass in a list of projects (like when building the solution) that may be helpful, even if have to list them in build order. Possibly tell it to ignore locked files (because these files packages ARE there),

How are you building the solution? Are you passing any command line arguments?

I ask because issues like this can be caused by things like specifying the output path on the command line, and then multiple projects try to build to the same output path and can get file conflicts.

dsplaisted avatar Sep 16 '20 06:09 dsplaisted

Yep! I am specifying the output path because I want them all in a single folder when I'm done. I suppose I could let them all publish to their own folders then copy the files down using a script or something. This is actually running in a devops pipeline. I tried doing a build then a publish --no-build thinking it may just be a glorified copy. This is inside a dockerfile, I'm running something like;

dotnet publish mysolution.sln --output /app

CWoodruffRES avatar Sep 16 '20 06:09 CWoodruffRES

So my VERY temporary workaround... I am doing a regular BUILD to the output folder... then I am individually PUBLISHING a handful of (core) components that have references to all the required nuget packages. This is working for me - since I met that deadline I can do something better.

Thanks dsplaisted for your comment it was helpful!

CWoodruffRES avatar Sep 16 '20 20:09 CWoodruffRES

This is still an issue with .Net 5... is this ever going to get fixed?

rtaylor72 avatar Jan 05 '21 17:01 rtaylor72

My workaround for this is to disable concurrent build like this:

dotnet publish -maxcpucount:1

But a proper fix would certainly be nice :)

reto-hal9k avatar Jan 06 '21 09:01 reto-hal9k

still an issue here with .net5 in November 2021....any updates?

southrivertech avatar Nov 18 '21 19:11 southrivertech

@southrivertech It looks to me like the issue here has to do with running dotnet publish (and possibly also dotnet build) on a solution while trying to specify the TargetFramework.

Is that what you're encountering?

If so, what do you expect the --framework argument to do when you use it on a solution? Should it apply that TargetFramework to all projects regardless of whether they would normally target the specified framework? Or try to do some filtering to just projects that already have that target listed as one of their TargetFrameworks?

dsplaisted avatar Nov 18 '21 23:11 dsplaisted

My workaround for this is to disable concurrent build like this:

dotnet publish -maxcpucount:1

But a proper fix would certainly be nice :)

Doesn't make sense to me because documentation is saying:

Specifies the maximum number of concurrent processes to use when building. If you don't include this switch, the default value is 1.

For us the fix was to disable antivirus.

kaburkett avatar Feb 25 '22 19:02 kaburkett