Shadow copy directory in inconsistent state after reboot
(Reported by @richardd2 here: https://github.com/dotnet/AspNetCore.Docs/issues/23733#issuecomment-1309996000)
We have a Windows Server 2019 VM hosting (currently) 15 instances of an API, all based on identical code. They're all configured with separate AppPools, all set to "no managed code", and all configured for shadow-copying.
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<location path="." inheritInChildApplications="false">
<system.webServer>
<handlers>
<add name="aspNetCore" path="*" verb="*" modules="AspNetCoreModuleV2" resourceType="Unspecified" />
</handlers>
<aspNetCore processPath=".\Assembly.exe" stdoutLogEnabled="false" stdoutLogFile=".\logs\stdout" hostingModel="inprocess">
<handlerSettings>
<handlerSetting name="experimentalEnableShadowCopy" value="true" />
<handlerSetting name="shadowCopyDirectory" value="../shadow/api/" />
<!-- Only enable handler logging if you encounter issues-->
<!--<handlerSetting name="debugFile" value=".\logs\aspnetcore-debug.log" />-->
<!--<handlerSetting name="debugLevel" value="FILE,TRACE" />-->
</handlerSettings>
</aspNetCore>
</system.webServer>
</location>
</configuration>
ANCM version is 16.0.22232.9.
Mostly this works fine. But for the last two months, when the server reboots to install updates, one AppPool seemingly chosen at random ends up with its shadow copy being corrupted. Specifically, the .deps.json file is filled with nulls, and the main assembly .dll is 0kb in size. There are three errors logged in the application event log by the ANCM:
1027: Unable to locate application dependencies. Ensure that the versions of Microsoft.NetCore.App and Microsoft.AspNetCore.App targeted by the application are installed.
1027: Could not find 'aspnetcorev2_inprocess.dll'. Exception message: A JSON parsing exception occurred in C:\inetpub\Instance\shadow\api\99\Assembly.deps.json, offset 0 (line 1, column 1): The document is empty. Error initializing the dependency resolver: An error occurred while parsing: C:\inetpub\Instance\shadow\api\99\Assembly.deps.json
1010: Failed to start application '/LM/W3SVC/9/ROOT/api', ErrorCode '0x8000ffff'.
After stopping the affected AppPool, deleting the shadow folder, and starting the AppPool again, everything springs back into life.
There don't appear to be any other relevant events in the event log around the time of the restart.
I don't know whether this is a bug in the .NET 6 shadow copying option, or a problem with the VM, or whether something else is interfering when the server restarts.
Has anyone else seen this, or have any suggestions on how to diagnose and/or fix it? (The debug file won't help, since the issue occurs before the application has a chance to start.)
Originally posted by @RichardD2 in https://github.com/dotnet/AspNetCore.Docs/issues/23733#issuecomment-1309996000
@richardd2 Does this happen after every reboot? Can you reproduce the issue with a simple Hello World type app?
@HaoK Any thoughts about what could cause this?
@RichardD2 so there's always one and only one app pool that is corrupted? Can you think of any other changes that happened around the time this started happening 2 months ago)?
Since you don't see any messages around the error I'm guessing this is something that the file watcher caused issues with, it looks like the file watcher calling CopyToDirectoryInner which looks at the last modified time of the file before updating it, I wonder if these files are locked so it copies over munged / null files. https://github.com/dotnet/aspnetcore/blob/547de595414d6ebb9ddeaa4a231815449c9f3c60/src/Servers/IIS/AspNetCoreModuleV2/CommonLib/Environment.cpp#L166
One potential issue I see is that we don't check whether CopyFile (which appears to be the CopyFileW function https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-copyfilew) succeeded, at a minimum we should probably check and log an error here: https://github.com/dotnet/aspnetcore/blob/547de595414d6ebb9ddeaa4a231815449c9f3c60/src/Servers/IIS/AspNetCoreModuleV2/CommonLib/Environment.cpp#L192
Hi @adityamandaleeka. We have added the "Needs: Author Feedback" label to this issue, which indicates that we have an open question for you before we can take further action. This issue will be closed automatically in 7 days if we do not hear back from you by then - please feel free to re-open it if you come back to this issue after that time.
@HaoK It always seems to affect only one app pool; but it's been a different one each time.
It looks like there was a gap in the Windows Updates due to an issue with the hosting company's WSUS server, so October would have been the first month of updates since we moved this application to .NET 6. The October update took it from 6.0.6 to 6.0.9. I don't know whether that would update the ANCM at the same time.
@adityamandaleeka So far it's happened twice, immediately after installing Windows updates and restarting. It's a live server, so I haven't tried rebooting without pending updates. I also haven't tried a simple "hello world" app - I'm not sure how much use that would be, since the affected app pool seems to be random, so the chances of it affecting a specific target app pool are quite low.
This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.
See our Issue Management Policies for more information.
What further information is required?
This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.
See our Issue Management Policies for more information.
Again, what further information is required?
Is this bot just going to keep threatening to close the issue unless I post a pointless comment every few days?!
@RichardD2 I just removed the tag that was causing that.
Thanks for contacting us.
We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.
Moving to .NET 8 Planning to address @HaoK's comment above: https://github.com/dotnet/aspnetcore/issues/45017#issuecomment-1312103130
Just had a repeat of this issue - this time it happened almost immediately after installing .NET 6.0.13 and 7.0.1 on the server, without waiting for a reboot.
Log Name: Application Source: IIS AspNetCore Module V2 Date: 16/01/2023 12:15:18 Event ID: 1027 Level: Error Description: Unable to locate application dependencies. Ensure that the versions of Microsoft.NetCore.App and Microsoft.AspNetCore.App targeted by the application are installed.
Followed by:
Log Name: Application Source: IIS AspNetCore Module V2 Date: 16/01/2023 12:15:18 Event ID: 1027 Level: Error Description: Could not find 'aspnetcorev2_inprocess.dll'. Exception message: A JSON parsing exception occurred in [C:\inetpub..._shadow\api\502\OurApp.deps.json], offset 0 (line 1, column 1): The document is empty. Error initializing the dependency resolver: An error occurred while parsing: C:\inetpub..._shadow\api\502\OurApp.deps.json
And then:
Log Name: Application Source: IIS AspNetCore Module V2 Date: 16/01/2023 12:15:18 Event ID: 1010 Level: Error Description: Failed to start application '/LM/W3SVC/8/ROOT/api', ErrorCode '0x8000ffff'.
The updates were installed at around 12:07. The error started on the first request to this specific AppPool after the update was installed. The other 14 AppPools running the same code were not affected.
We're still seeing this issue on a semi-regular basis. Should we try installing the .NET 8 ANCM, or would we need to upgrade the application itself to .NET 8 as well?