service-fabric
service-fabric copied to clipboard
Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS issue.
I've downloaded the Service Fabric SDK for VS 2017 from here: http://www.microsoft.com/web/handlers/webpi.ashx?command=getinstallerredirect&appid=MicrosoftAzure-ServiceFabric-CoreSDK
The initial install on my Windows 10 v1709 workstation (fully patched) completes successfully. The problem manifests when I try to setup a cluster:
C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup
λ .\DevClusterSetup.ps1
Using Cluster Data Root: C:\SfDevCluster\Data
Using Cluster Log Root: C:\SfDevCluster\Log
The generated json path is C:\Users\kthompson\AppData\Local\Temp\tmp3B1A.tmp.json
Processing and validating cluster config.
Create node configuration succeeded
Starting service FabricHostSvc. This may take a few minutes...
Waiting for Service Fabric Cluster to be ready. This may take a few minutes...
Local Cluster ready status: 4% completed.
Local Cluster ready status: 8% completed.
Local Cluster ready status: 12% completed.
Local Cluster ready status: 17% completed.
Local Cluster ready status: 21% completed.
Local Cluster ready status: 25% completed.
Local Cluster ready status: 29% completed.
Local Cluster ready status: 33% completed.
Local Cluster ready status: 38% completed.
Local Cluster ready status: 42% completed.
Local Cluster ready status: 46% completed.
Local Cluster ready status: 50% completed.
Local Cluster ready status: 54% completed.
Local Cluster ready status: 58% completed.
Local Cluster ready status: 62% completed.
Local Cluster ready status: 67% completed.
Local Cluster ready status: 71% completed.
Local Cluster ready status: 75% completed.
Local Cluster ready status: 79% completed.
Local Cluster ready status: 83% completed.
Local Cluster ready status: 88% completed.
Local Cluster ready status: 92% completed.
Local Cluster ready status: 96% completed.
Local Cluster ready status: 100% completed.
WARNING: Service Fabric Cluster is taking longer than expected to connect.
Waiting for Naming Service to be ready. This may take a few minutes...
No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS issue.
Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS
issue.
At C:\Program Files\Microsoft SDKs\Service Fabric\Tools\Scripts\ClusterSetupUtilities.psm1:620 char:12
+ [void](Connect-ServiceFabricCluster @connParams)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [Connect-ServiceFabricCluster], FabricException
+ FullyQualifiedErrorId : TestClusterConnectionErrorId,Microsoft.ServiceFabric.Powershell.ConnectCluster
Pulling my hair out with this over the last couple days. Here's thing's I've tried:
- Firewall exception on port 19000.
- Uninstall and re-install of Service Fabric SDK.
- Repair of vcredist.
- Executed CleanCluster.ps1 before running DevClusterSetup.ps1
- Uninstall and re-install of Visual Studio 2017.
- Excluded applicable folders and processes from antivirus.
- Ensured Powershell execution policy is set to unrestricted.
Can you share the generated json template? C:\Users\kthompson\AppData\Local\Temp\tmp3B1A.tmp.json
{
"name": "DevCluster",
"clusterConfigurationVersion": "1.0.0",
"apiVersion": "10-2017",
"nodes": [
{
"nodeName": "_Node_0",
"iPAddress": "ComputerFullName",
"nodeTypeRef": "NodeType0",
"faultDomain": "fd:/0",
"upgradeDomain": "0"
}
],
"properties": {
"diagnosticsStore": {
"metadata": "Please replace the diagnostics file share with an actual file share accessible from all cluster machines.",
"dataDeletionAgeInDays": "3",
"storeType": "FileShare",
"connectionstring": "%systemdrive%\\ProgramData\\SF\\DiagnosticsStore"
},
"nodeTypes": [
{
"name": "NodeType0",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpointPort": "19002",
"leaseDriverEndpointPort": "19001",
"serviceConnectionEndpointPort": "19006",
"httpGatewayEndpointPort": "19080",
"reverseProxyEndpointPort": "19081",
"applicationPorts": {
"startPort": "30001",
"endPort": "31000"
},
"isPrimary": true
}
],
"fabricSettings": [
{
"name": "Setup",
"parameters": [
{
"name": "FabricDataRoot",
"value": "C:\\SfDevCluster\\Data"
},
{
"name": "FabricLogRoot",
"value": "C:\\SfDevCluster\\Log"
},
{
"value": "true",
"name": "IsDevCluster"
}
]
},
{
"name": "Diagnostics",
"parameters": [
{
"name": "ProducerInstances",
"value": "ServiceFabricEtlFile,ServiceFabricPerfCtrFolder"
},
{
"name": "MaxDiskQuotaInMB",
"value": "10240"
},
{
"name": "EnableCircularTraceSession",
"value": "true"
}
]
},
{
"name": "FabricClient",
"parameters": [
{
"name": "HealthReportSendInterval",
"value": "0"
}
]
},
{
"name": "Failover",
"parameters": [
{
"name": "SendToFMTimeout",
"value": "1"
},
{
"name": "NodeUpRetryInterval",
"value": "1"
}
]
},
{
"name": "Federation",
"parameters": [
{
"name": "NodeIdGeneratorVersion",
"value": "V4"
},
{
"name": "UnresponsiveDuration",
"value": "0"
},
{
"name": "ProcessAssertExitTimeout",
"value": "86400"
}
]
},
{
"name": "Hosting",
"parameters": [
{
"name": "EndpointProviderEnabled",
"value": "true"
},
{
"name": "RunAsPolicyEnabled",
"value": "true"
},
{
"name": "EnableProcessDebugging",
"value": "true"
},
{
"name": "DeactivationScanInterval",
"value": "600"
},
{
"name": "DeactivationGraceInterval",
"value": "2"
},
{
"name": "ServiceTypeRegistrationTimeout",
"value": "20"
},
{
"name": "CacheCleanupScanInterval",
"value": "300"
},
{
"name": "DeploymentRetryBackoffInterval",
"value": "1"
}
]
},
{
"name": "Management",
"parameters": [
{
"name": "ImageStoreConnectionString",
"value": "ImageStoreConnectionStringPlaceHolder"
},
{
"name": "ImageCachingEnabled",
"value": "false"
},
{
"name": "EnableDeploymentAtDataRoot",
"value": "true"
},
{
"name": "DisableChecksumValidation",
"value": "true"
}
]
},
{
"name": "PlacementAndLoadBalancing",
"parameters": [
{
"name": "MinLoadBalancingInterval",
"value": "300"
},
{
"name": "TraceCRMReasons",
"value": "false"
}
]
},
{
"name": "ReconfigurationAgent",
"parameters": [
{
"name": "IsDeactivationInfoEnabled",
"value": "true"
},
{
"name": "ServiceApiHealthDuration",
"value": "20"
},
{
"name": "ServiceReconfigurationApiHealthDuration",
"value": "20"
},
{
"name": "LocalHealthReportingTimerInterval",
"value": "5"
},
{
"name": "RAUpgradeProgressCheckInterval",
"value": "3"
},
{
"name": "RAPMessageRetryInterval",
"value": "0.5"
},
{
"name": "MinimumIntervalBetweenRAPMessageRetry",
"value": "0.5"
}
]
},
{
"name": "ServiceFabricEtlFile",
"parameters": [
{
"name": "DataDeletionAgeInDays",
"value": "3"
},
{
"name": "IsEnabled",
"value": "true"
},
{
"name": "ProducerType",
"value": "EtlFileProducer"
},
{
"name": "EtlReadIntervalInMinutes",
"value": "5"
}
]
},
{
"name": "ServiceFabricPerfCtrFolder",
"parameters": [
{
"name": "DataDeletionAgeInDays",
"value": "3"
},
{
"name": "IsEnabled",
"value": "true"
},
{
"name": "ProducerType",
"value": "FolderProducer"
},
{
"name": "FolderType",
"value": "ServiceFabricPerformanceCounters"
}
]
},
{
"name": "Trace/Etw",
"parameters": [
{
"name": "Level",
"value": "4"
}
]
},
{
"name": "TransactionalReplicator",
"parameters": [
{
"name": "CheckpointThresholdInMB",
"value": "64"
}
]
}
],
"addOnFeatures": [
"DnsService"
]
}
}
@maburlik - I don't see anything obvious from the manifest.
I wonder if you are seeing the same issue as reported in microsoft/service-fabric-issues#1056. Would you mind checking:
- Whether
Fabric.exe
process is running or not. - If not running, presence of following errors in Event Log:
Spot on. I see the following in my logs:
Fabric Node open failed with error code = E_ACCESSDENIED
Also seeing:
HostedService: _Node_0 on node id bf865279ba277deb864a976fbf4c200e terminated unexpectedly with code 7167 and process name Fabric.exe
HostedServiceInstance:HostedService/_Node_0_Fabric terminated with exitcode 7167
client-localhost:19000/127.0.0.1:19000: error = 2147943625, failureCount=93. Filter by (type~Transport.St && ~"(?i)localhost:19000") to get listener lifecycle. Connect failure is expected if listener was never started, or listener/its process was stopped before/during connecting.
One of our primary use cases in evaluating Service Fabric is to use it for containers. Is there documentation on how to configure a dev cluster for containers using self signed tls certs?
Thanks @knizkar - let's track this on microsoft/service-fabric-issues#1056.
@MisterPuffyPants - Regarding setting up a dev cluster with containers, a doc will be posted one of the following days, as this is only officially supported in 6.2. Main thing is to make sure that the docker service is started when creating the cluster, that will enable the support in Service Fabric.
Exactly same issue here. Any updates?
Had the same issue, the only thing that helped - going back to 6.2.283/3.1.283
Any updates? Still see it in the newest version
@EvilAvenger: Catching up on this issue, have you gone through the solutions proposed in this issue? https://github.com/Azure/service-fabric-issues/issues/1056
@MikkelHegn
Yes I did, it does not work. Currently the issues is revealing on our deployment machine, so I can't properly test it (as it blocks my team).
The only thing that really helps is installation of 6.2.283.9494. (Installation of prior version, but copying files from 6.2..283 to "C:\Program Files\Microsoft SDKs\Service Fabric" helps as well.)
All the other versions are not working, so it might be, that the issue has been brought somewhere in *.301;
What I've tried:
- Checked that my WinFirewall service is working and is not blocking ports;
- Checked that "everyone" has write permissions;
- Checked that nothing is working on 19000 with netstat;
- Execution policy is set to ByPass;
Event log issues: Currently I can't provide full event log as I've reinstalled the service, I've seen several records in EL:
- FileChangeMonitor failed with E_ACCESSDENIED
- FolderACLManager::Install failed with error E_INVALIDARG
- GetFileAttributesEx failed with the following error 5
Thanks for your patience on this one @EvilAvenger. @maburlik for the diagnostics info above, do you have any ideas what might be causing this?
Also blocked by this now @MikkelHegn . Anyone any closer to figuring out what is going on? I have tried all the workarounds and it's no use.
Folks, if the workaround mentioned in microsoft/service-fabric-issues#1056 isn't working for you, can you please share full setup logs from the environment? May be you are running into something else here.
(Assuming Windows) The reg key HKLM\SOFTWARE\Microsoft\ServiceFabric\FabricLogRoot should point to the location of the logs. Zip the directory and attach the file here; you can also zip and email it to us (raunakp, or mikhegn at microsoft dot com) if you want.
Just to give my two cents on this issue. I was also having the same problem with Windows 10 and the latest SDK. I had checked the windows firewall, removed webroot av, reinstalled the SDK multiple time, reverted back to older SDKs, checked the folder permissions, changed to network service account and any other solutions proposed in this issue https://github.com/Azure/service-fabric-issues/issues/1056
The fix for me was quite simple, @JayRidge95 noticed the hostname was being chopped in the event logs. My computer name was longer than the 15 character net bios name. So we changed my computer name to be shorter than 15 characters, reinstalled the SDK and it worked fine.
Bit of an odd one but it took me about 3 days to get to that point so this might save some people time.
@tjackadams this works like a charm.I have just shorten the computer name.I was stuck in this issue since last 4 days.
@tjackadams thanks. It worked. Dear SF team can you fix this issue or at least provide a better error message to identify the issue and solution quickly.
This workaround did not work for me. :( It's still not working.
@raunakpandya is there any update on this?
@andrewcoll +1 Not working for me as well
@andrewcoll - Have you tried the workaround to set the FabricContainerAppsEnabled to false? If not, can you try adding the following section under the hosting section in the ClusterManifestTemplate.json files (depending on the type of one box you bringing up, there would be one file) under %programifiles%\Microsoft SDKs\Service Fabric\ClusterSetup:
Add the following section under the Hosting tab -
{
"name": "FabricContainerAppsEnabled",
"value": "false"
}
@raunakpandya yes, I tried that, it didn't work either. I attached my logs in a previous comment.
Yes. I did look at the logs. Strange, which json file did you modify, can you attach the same? Also, what one box mode are you trying to bring up (secure/unsecure/ 1 box/5 box)?
The @raunakpandya 's answer work for me. Thanks!!!
@tjackadams your solution worked for me. Shorten computer name (was longer than 15 characters). Thank you!
FabricContainerAppsEnabled
@raunakpandya could you please explain why disabling this settings solve this issue ?
@Kassoul - This has the details: https://github.com/Azure/service-fabric-issues/issues/1056#issuecomment-400413031
By disabling that, the self signed certificate is no longer created.
I have seen the same error when trying to start up my local cluster. In my case, I noticed that some dll is missing from the Fabric.exe - from 'HostService: <Node> on node id
I have seen the same error when trying to start up my local cluster. In my case, I noticed that some dll is missing from the Fabric.exe - from 'HostService: on node id terminated unexpectedly with code 3221225781 and process name Fabric.exe' error message. For me, The issue was that some of the vc++ dlls went missing and can be fixed by reinstall "C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\vcredist_x64.exe".
This fixes the issue for me!