FailoverClusterDsc
FailoverClusterDsc copied to clipboard
Unable to add a Node to an existing Cluster
When adding another node to the Cluster, the Set-TargetResource function uses "Add-ClusterNode $env:COMPUTERNAME -Cluster $Name" on line 129. This fails with the error message:
Add-ClusterNode : Check the spelling of the cluster name. Otherwise, there might be a problem with your network. Make sure the cluster nodes are turned on and connected to the network or contact your network administrator. The RPC server is unavailable At line:1 char:1
- Add-ClusterNode $env:COMPUTERNAME -Cluster $Name
-
- CategoryInfo : ConnectionError: (:) [Add-ClusterNode], ClusterCmdletException - FullyQualifiedErrorId : ClusterRpcConnection,Microsoft.FailoverClusters.PowerShell.AddClusterNodeComman d
When running "Get-Cluster -Name $Name -Domain $ComputerInfo.Domain", this does work and return the Cluster as expected.
When running: Add-ClusterNode $env:COMPUTERNAME -Cluster "<NAME OF AN EXISTING CLUSTER NODE>" this does work and add the new node successfully.
"Get-ClusterNode -Cluster $Name" does not work from a node that is not yet in the Cluster.
This essentially prevents adding Nodes to the Cluster using this DSC Resource.
Environment tested is on 2 Azure VMs.
@matthitchcock do you still have this issue? I have been attempting to recreate it, but I can add nodes to an existing cluster without any issue.
I haven't tried again recently to be honest. I'll see if I can get some time this week, otherwise if it can't be reproduced then we can close it and see if someone else runs into it
Closing per comments.
I am running into this same issue on Azure VMs. I am unable to run the Get/Add-ClusterNode cmdlets from the remote server which is not yet a part of the cluster. Perhaps an Azure specific issue?
@shawntierney Did you install the RSAT-Clustering-PowerShell feature first, using DSC?
I did..and the cmdlets work locally. The only way I have been able to get it to work from the remote node is to add the functionality to enable CredSSP and use Invoke-Command for all node level cmdlets. Still working on it...
cc @kwirkykat @mbreakey3
This is an Azure limitation at the moment. When used in Azure, the *-cluster* commands which target the CNO will fail. This is because the CNO is not reachable from any node which doesn't own the CNO role. If you're familiar with networking it's similar to dynamic arp inspection (DAI) and dhcp snooping preventing access to a statically assigned IP.
I tried a few things including building a load balancer using a probeport (similar to the current method of deploying SQL AlwaysOn in Azure) and then load balancing RPC. This starts to work properly, but then fails when it comes to negotiating a dynamic high port. So you could technically load balance all TCP and UDP ports for the CNO and this would work but i haven't tried it and i'm sure it wouldn't be supported if something went pear shaped.
The workaround I've been provided at the moment is to target the commands directly at the node
i.e.:
get-cluster node1
instead of
get-cluster cluster1
Apparently there's a big announcement due in the next few weeks around clustering in Azure which is meant to address this.
This makes sense and is exactly the behavior I experienced. I ended up using that exact workaround after trial and error but it's good to know that there is a limitation. I left the current code to ensure compatibility when this is no longer a limitation. To address the limitation, I utilize get-clustergroup to test the current code, and if the result is null, I replace the $Name parameter with the owner node name to ensure functionality. Additionally, I added logic to move the cluster group to the primary server if the owner node is not the 'primary' server. I encountered the scenario where the owner node has changed to the 'secondary' server, which causes the script to fail.
Just an FYI, I encountered a similar issue with the xSQLAvailabilityGroup resource. I only mention it here because the xSQLAvailabilityGroup resource follows the xCluster resource when creating a SQL Availability Group. Similar to the Azure cluster owner node issue, availability group creation will fail if the 'primary' server (01) where the code is executed is not the cluster node owner. The move-clustergroup logic added to xCluster seems to mitigate this issue. However, I did encounter one issue where the AG configuration still failed due to the 'secondary' server (02) being set as the Primary in SQL. My assumption is that this was set during prior testing when an Availability Group was configured on the 'secondary' node and therefore will not be a common occurrence. None-the-less, it's worth noting.
+1 to @gladier and @shawntierney's findings. I was able to join a cluster/availability group via Node B when referencing the cluster as Node A, rather than the CNO.
@glennmate what is the code you used to join node b to the cluster?
Same issue cant join the node 2 to the cluster node1. Error:- RPC Server not available
I have tested everything now, and also getting different errors every time I test something different, When I assigned the security policy to everyone on the Cluster DNS record, after that I was able to connect to the cluster from node2, earlier I was not able to connect to the cluster by specifying the Cluster name and neither Can I connect to it by specifying cluster and its domain name. When I thought of re-run the script only for node 2 I got an Error saying the node is already joined to the cluster, However it was not joined to any of it, this confuses me.
Same issue here... Please help how to resolve it.
Powershell Cmdlet failed: Check the spelling of the cluster name. Otherwise, there might be a problem ←[0m with your network. Make sure the cluster nodes are turned on and connected to ←[0m the network or contact your network administrator. ←[0m + CategoryInfo : ConnectionError: (:) [], CimException ←[0m + FullyQualifiedErrorId : ClusterRpcConnection,Microsoft.FailoverClusters. ←[0m PowerShell.GetNodeCommand ←[0m + PSComputerName : localhost ←[0m ←[0m The PowerShell DSC resource '[xCluster]DirectResourceAccess' with SourceInfo ←[0m '' threw one or more non-terminating errors while running the ←[0m Test-TargetResource functionality. These errors are logged to the ETW channel ←[0m called Microsoft-Windows-DSC/Operational. Refer to this channel for more ←[0m details. ←[0m + CategoryInfo : InvalidOperation: (root/Microsoft/...gurationMan ←[0m ager:String) [], CimException ←[0m + FullyQualifiedErrorId : NonTerminatingErrorFromProvider ←[0m + PSComputerName : localhost ←[0m [2017-11-01T12:51:05+00:00] FATAL: Chef::Exceptions::PowershellCmdletException: dsc_resource[test-cluster ] (SqlServer::Create_Cluster line 7) had an error: Chef::Exceptions::PowershellCmdletException: Powershell Cmdlet faile d: Check the spelling of the cluster name. Otherwise, there might be a problem with your network. Make sure the cluster nodes are turned on and connected to the network or contact your network administrator.
- CategoryInfo : ConnectionError: (:) [], CimException
- FullyQualifiedErrorId : ClusterRpcConnection,Microsoft.FailoverClusters. PowerShell.GetNodeCommand
- PSComputerName : localhost
The PowerShell DSC resource '[xCluster]DirectResourceAccess' with SourceInfo '' threw one or more non-terminating errors while running the Test-TargetResource functionality. These errors are logged to the ETW channel called Microsoft-Windows-DSC/Operational. Refer to this channel for more details.
- CategoryInfo : InvalidOperation: (root/Microsoft/...gurationMan ager:String) [], CimException
- FullyQualifiedErrorId : NonTerminatingErrorFromProvider
- PSComputerName : localhost ERROR: Failed to execute command on return code 1 ERROR: Bootstrap command returned 1
is it possible to use xFailoverCluster (1.8.0.0), when you invoking the DSC actions from the authoring node, or should the configuration be run locally on the nodes which you are going to configure. I'm still thrown with erros like the one described in this thread. Have not been touching CredSSP at all so far. Is still invoking command the only workaround ?
I am having the same problem. Instead of using xCluster to join nodes to the cluster, I have a Script resource:
Script JoinExistingCluster
{
GetScript = {
return @{ 'Result' = $true }
}
SetScript = {
$targetNodeName = $env:COMPUTERNAME
Add-ClusterNode -Name $targetNodeName -Cluster $using:ClusterOwnerNode
}
TestScript = {
$targetNodeName = $env:COMPUTERNAME
$(Get-ClusterNode -Cluster $using:ClusterOwnerNode).Name -contains $targetNodeName
}
DependsOn = "[xWaitForCluster]WaitForCluster"
PsDscRunAsCredential = $DomainCreds
}
That seems to work, but it is a bit of a hack.
for creating the cluster ( New-Cluster....) you used the Script block as well? how does it look? Thanks.
@mkokoy2 I used the xCluster resource to create the cluster. You can find my DSC script here:
https://github.com/hansenms/iac/blob/master/sql-alwayson/DSC/PrepareSQLServer.ps1
For those of you who haven't seen - Azure has a new(ish) internal load balancer configuration called HA Ports. If you configure a HA Ports rule on an internal load balancer with a probe port similar to below this should work. HA ports load balancers forward all ports which then does away with the limitations above.
This is probably not officially supported since HA ports were targeted towards firewall products so YMMV.
Script CNOProbe {
#This is used for the Azure Load Balancer Probe, setting this on premise has no effect
GetScript = {
return $true
}
SetScript = {
Import-Module FailoverClusters -Verbose:$false
$Resource = Get-ClusterResource "Cluster IP Address"
Set-ClusterParameter -InputObject $Resource -Name "ProbePort" -Value "59999"
Stop-ClusterResource $Resource
Start-ClusterResource $Resource
}
TestScript = {
Import-Module FailoverClusters -Verbose:$false
$Resource = Get-ClusterResource "Cluster IP Address"
$ProbePortParams = Get-ClusterParameter -InputObject $Resource -Name ProbePort
write-verbose "Found ProbePort $($ProbePortParams.Value) for $($Resource.Name)"
if ($ProbePortParams.Value -eq 59999) {
write-verbose "Parameters OK"
return $true
}
else {
write-verbose "Bad Parameters"
return $false
}
}
}
I think @gladier is referencing this one Configure High Availability Ports for an internal load balancer.
@gladier we should add the probe property support to a resource too, what resource does it fit in best (existing or new)?
@johlju - That's the config i was referring to.
While we could add it to the xCluster DSC resource; this doesn't make it very reusable for other cluster resources - for example SQL Availability Groups.
Looking at the resource parameters I update on a regular basis I would propose something like:
xWaitforClusterResource MyClusterResource #Used to wait for a particular resource to be created (e.g. SQL Availability Group Listeners)
{
Name = [String]
ResourceName = [String]
}
xClusterResourceParameter MyResourceParameter #Used to actually set the parameter
{
Name = [String]
ResourceName = [String]
[HostRecordTTL = [Uint32]]
[RegisterAllProvidersIP = [bool]]
[ProbePort = [Uint32]]
[DependsOn = [string[]]]
}
However this is probably not the best approach as not all parameters apply to all resources. For example; Cluster IP resources hold the ProbePort parameter; and Cluster name resources hold the RegisterAllProvidersIP and HostRecordTTL parameters.
This issue also duplicates/resolves a number of other open issues - #29 #173 and probably #186.
I like you proposal, but if a property targets different scopes within a cluster, I think we should have a resource per scope. 🤔
Also, instead og having xWaitForClusterResource, maybe we could make one resource able to wait for different types of artifacts, like cluster, cluster group, cluster resource etc. Making the existing xWaitForCluster obsolete. Or maybe we can leverage WaitForAll, WaitForAny and WaitForSome.
Did this issue ever get resolved? I'm having the same problem joining the second node to the cluster. I haven't tried the script block provided by @hansenms yet but that may be my next steps.
For those of you who haven't seen - Azure has a new(ish) internal load balancer configuration called HA Ports. If you configure a HA Ports rule on an internal load balancer with a probe port similar to below this should work. HA ports load balancers forward all ports which then does away with the limitations above.
This is probably not officially supported since HA ports were targeted towards firewall products so YMMV.
Script CNOProbe { #This is used for the Azure Load Balancer Probe, setting this on premise has no effect GetScript = { return $true } SetScript = { Import-Module FailoverClusters -Verbose:$false $Resource = Get-ClusterResource "Cluster IP Address" Set-ClusterParameter -InputObject $Resource -Name "ProbePort" -Value "59999" Stop-ClusterResource $Resource Start-ClusterResource $Resource } TestScript = { Import-Module FailoverClusters -Verbose:$false $Resource = Get-ClusterResource "Cluster IP Address" $ProbePortParams = Get-ClusterParameter -InputObject $Resource -Name ProbePort write-verbose "Found ProbePort $($ProbePortParams.Value) for $($Resource.Name)" if ($ProbePortParams.Value -eq 59999) { write-verbose "Parameters OK" return $true } else { write-verbose "Bad Parameters" return $false } } }
Hi, when I’m using start-clusterresource, dsc says “Undefined DSC resource start-clusterresource” (https://stackoverflow.com/questions/64068439/dsc-error-failing-to-recognise-a-cmdlet)
Any ideas??
EDIT: This is fixed - no idea how, just decided to to work!
First of all, Thank You for Your code!
this bug still actual. 7 years, LOL)
FailoverClusterDsc 2.1.0
Windows Server 2016 Standard
It is impossible to add node to cluster. It is because of some comandlets can not find cluster by specifying its name without domain. For example this comandlet works correctly, because
domain specified https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L177
and all of this works incorrectly:
https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L83
https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L243
https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L265
https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L379
throwing error that cluster is not found :
"msg": "Failed to invoke DSC Test method: Check the spelling of the cluster name. Otherwise, there might be a problem with your network. Make sure the cluster nodes are turned on and connected to the network or contact your network administrator.",
as workaround it is possible while configuring cluster before adding node to replace file "C:\\Program Files\\WindowsPowerShell\\Modules\\FailoverClusterDsc\\2.1.0\\DSCResources\\DSC_Cluster\\DSC_Cluster.psm1" with patched one, where in mentioned places $Name variable is replaced with exact cluster name, containing Domain name. Also after replacing $Name in function Test-TargetResource it returns strange value that is neither True nor False, and i removed the whole its code and just return False for calling it while adding node.