service-fabric icon indicating copy to clipboard operation
service-fabric copied to clipboard

Restart-ServiceFabricDeployedCodePackage generates errors in the console for stateless singleton applications

Open alxoldman opened this issue 6 years ago • 13 comments

Hi Our project has a Service Fabric cluster with several apps hosted inside it. Some of the apps are stateful and are running on several partitions, other apps are stateless singletons. From time to time we have to restart the applications. To do this remotely, we chose PowerShell. Restart-ServiceFabricDeployedCodePackage command-let works perfectly for the multi-partition applications. I have been using it in the following way: Restart-ServiceFabricDeployedCodePackage -ApplicationName <AppName> -ServiceName <ServiceName> -PartitionId <PartitionId> -CommandCompletionMode Verify But for the single-partition apps this command-let was generating an error in the PowerShell console:

Restart-ServiceFabricDeployedCodePackage : Did not find deployed code package for fabric:/CustomAp:Code on node NodeName1

I've tried to extend the number of command-let parameters to restart deployed code packages one by one. Eventually, the command-let invocation became the following: Restart-ServiceFabricDeployedCodePackage -NodeName <NodeName> -ApplicationName <AppName> -ServiceManifestName <ServiceManifestName> -CodePackageName <CodePackageName> -ServicePackageActivationId <ServicePackageActivationId> -CommandCompletionMode Verify But even after that, I see the error from time to time. It occurs not in 100% of command-let invocations, but if the restarted app has three instances (and, correspondingly, three code packages), at least one of the invocations generates the error. The interesting thing is that, in fact, the apps are restarted regardless of existance of the error. The apps are restarted even when I tried to restart the whole partition without referring to an exact code package. Can anybody suggest how to remove these errors? I found an already closed ticket - https://github.com/Azure/service-fabric-issues/issues/1106 It contains no solution, so I still don't know what to do. If it helps, all the apps have "." in their names and are ExclusiveProcess.

alxoldman avatar Nov 08 '19 06:11 alxoldman

Hello everybody! The issue is still valid for me. I will be appreciated for any help in its solving.

alxoldman avatar Dec 02 '19 06:12 alxoldman

I have the same problem and have tried the same workarounds as you, with no success.

LarsKemmann avatar Dec 17 '19 02:12 LarsKemmann

There's also this SO post with the same issue and the same weird behavior being described.

LarsKemmann avatar Dec 17 '19 02:12 LarsKemmann

i am also getting the same error, even though, the code package is restarted.

dhruvmodi13 avatar Dec 19 '19 10:12 dhruvmodi13

Seems, the issue is not unique. But still no comments or solution from Microsoft :(

alxoldman avatar Dec 19 '19 11:12 alxoldman

At least as it was reported in the github issue the the error described in the solution is a benign/expected race condition. I bet you don't see it if you omit -Verify.

As to whether we can do better and whether all those parameters should be necessary, adding a few folks.

1106 was a legit bug where certain package names were not handled correctly.

masnider avatar Dec 19 '19 19:12 masnider

Thanks for your help, @masnider Regarding -Verify flag... I would rather prefer not to remove -CommandCompletionMode parameter. I don't want to restart a code package without confidence that the previous one is already launched.

alxoldman avatar Dec 20 '19 12:12 alxoldman

@gkhanna79 A year and 3 months later the issue is still there. Is that such a huge deal to fix it?

dmytro-gokun avatar Feb 02 '21 09:02 dmytro-gokun

I am also running into this issue. At this point it seems like the only way to restart a service is to start downing nodes one-by-one or remove/add/upgrade. That is a bit of an issue on a production cluster.

dvankurentxp avatar Jan 19 '23 15:01 dvankurentxp

I still see the issue in my environment, did anyone found the solution ?

mrpatil08 avatar Dec 15 '23 12:12 mrpatil08

What is the reproduction scenario? We have used the Restart-ServiceFabricDeployedCodePackage cmdlet quite often over the years and have never experienced this issue.

mfmadsen avatar Dec 15 '23 16:12 mfmadsen

I am trying to restart API node by node below is the scenario : ( We are using an On-prem Service Fabric Cluster )

PS C:\WINDOWS\system32> # Set variables for application and code package $nodeName = "vm0" $applicationName = "fabric:/API.Configuration" $codePackageName = "Code" $serviceManifestName = "API.ConfigurationAPIPkg"

#Get-ServiceFabricDeployedServicePackage -NodeName "vm0" -ApplicationName fabric:/API.Configuration -ServiceManifestName "API.ConfigurationAPIPkg" #Get-ServiceFabricNode

#Get-ServiceFabricDeployedCodePackage -NodeName $nodeName -ApplicationName $applicationName

Get the service fabric application

$application = Get-ServiceFabricApplication -ApplicationName $applicationName

Check if the application exists

if ($application -eq $null) { Write-Host "Application not found: $applicationName" } else { # Restart the specified code package on the specified node Restart-ServiceFabricDeployedCodePackage -NodeName $nodeName -ApplicationName $applicationName -CodePackageName $codePackageName -ServiceManifestName $serviceManifestName -CommandCompletionMode Verify

Write-Host "Restarting code package $codePackageName on node $nodeName for application $applicationName"

}


Output:

Restart-ServiceFabricDeployedCodePackage : Did not find deployed code package for fabric:/API.Configuration:Code on node vm0 At line:22 char:5

  • Restart-ServiceFabricDeployedCodePackage -NodeName $nodeName -App ...
    
  • ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : InvalidOperation: (Microsoft.Servi...usterConnection:ClusterConnection) [Restart-ServiceFabricDeployedCodePackage], FabricException
    • FullyQualifiedErrorId : CodePackageOperationErrorId,Microsoft.ServiceFabric.Powershell.RestartDeployedCodePackage

Below is the code package details :

PS C:\WINDOWS\system32> Get-ServiceFabricDeployedCodePackage -NodeName $nodeName -ApplicationName $applicationName

CodePackageName : Code CodePackageVersion : 1.0.0 ServiceManifestName : API.ConfigurationAPIPkg ServicePackageActivationId : 71ad20d7-40db-4001-a050-e3fa2b28bb77 HostType : ExeHost HostIsolationMode : None DeployedCodePackageStatus : Active RunFrequencyInterval : 0 EntryPoint : EntryPointStatus : Started CodePackageInstanceId : 133471161872726796 EntryPointLocation : C:\FabCluster\ProgramData\SF\vm0\Fabric\work\Applications\API.ConfigurationType_App1633\API.ConfigurationAPIPkg.Code.1.0 .0\API.ConfigurationAPI.exe ProcessId : 28096 ContainerId : RunAsUserName : DomainGMSA ActivationCount : 1 ActivationFailureCount : 0 ContinuousActivationFailureCount : 0 ContinuousExitFailureCount : 0 ExitCount : 0 ExitFailureCount : 0 LastActivationUtc : 12/15/2023 3:11:40 PM LastExitCode : 0 LastExitUtc : 1/1/0001 12:00:00 AM LastSuccessfulActivationUtc : 12/15/2023 3:11:40 PM LastSuccessfulExitUtc : 1/1/0001 12:00:00 AM SetupEntryPoint : CodePackageUsageStatistics :

mrpatil08 avatar Dec 15 '23 16:12 mrpatil08

Even with -CommandCompletionMode Verify this works fine for us - but we are not on on-premise cluster, but on an Azure hosted cluster.

mfmadsen avatar Dec 18 '23 19:12 mfmadsen