FailoverClusterDsc icon indicating copy to clipboard operation
FailoverClusterDsc copied to clipboard

Unable to add a Node to an existing Cluster

Open matthitchcock opened this issue 9 years ago • 27 comments

When adding another node to the Cluster, the Set-TargetResource function uses "Add-ClusterNode $env:COMPUTERNAME -Cluster $Name" on line 129. This fails with the error message:

Add-ClusterNode : Check the spelling of the cluster name. Otherwise, there might be a problem with your network. Make sure the cluster nodes are turned on and connected to the network or contact your network administrator. The RPC server is unavailable At line:1 char:1

  • Add-ClusterNode $env:COMPUTERNAME -Cluster $Name
  • - CategoryInfo          : ConnectionError: (:) [Add-ClusterNode], ClusterCmdletException
    - FullyQualifiedErrorId : ClusterRpcConnection,Microsoft.FailoverClusters.PowerShell.AddClusterNodeComman 
      d
    
    

When running "Get-Cluster -Name $Name -Domain $ComputerInfo.Domain", this does work and return the Cluster as expected.

When running: Add-ClusterNode $env:COMPUTERNAME -Cluster "<NAME OF AN EXISTING CLUSTER NODE>" this does work and add the new node successfully.

"Get-ClusterNode -Cluster $Name" does not work from a node that is not yet in the Cluster.

This essentially prevents adding Nodes to the Cluster using this DSC Resource.

Environment tested is on 2 Azure VMs.

matthitchcock avatar Dec 07 '15 12:12 matthitchcock

@matthitchcock do you still have this issue? I have been attempting to recreate it, but I can add nodes to an existing cluster without any issue.

DdenBraver avatar May 03 '16 09:05 DdenBraver

I haven't tried again recently to be honest. I'll see if I can get some time this week, otherwise if it can't be reproduced then we can close it and see if someone else runs into it

matthitchcock avatar May 03 '16 09:05 matthitchcock

Closing per comments.

TravisEz13 avatar May 08 '16 21:05 TravisEz13

I am running into this same issue on Azure VMs. I am unable to run the Get/Add-ClusterNode cmdlets from the remote server which is not yet a part of the cluster. Perhaps an Azure specific issue?

shawntierney avatar Apr 20 '17 14:04 shawntierney

@shawntierney Did you install the RSAT-Clustering-PowerShell feature first, using DSC?

DdenBraver avatar Apr 20 '17 15:04 DdenBraver

I did..and the cmdlets work locally. The only way I have been able to get it to work from the remote node is to add the functionality to enable CredSSP and use Invoke-Command for all node level cmdlets. Still working on it...

shawntierney avatar Apr 20 '17 19:04 shawntierney

cc @kwirkykat @mbreakey3

TravisEz13 avatar Apr 23 '17 15:04 TravisEz13

This is an Azure limitation at the moment. When used in Azure, the *-cluster* commands which target the CNO will fail. This is because the CNO is not reachable from any node which doesn't own the CNO role. If you're familiar with networking it's similar to dynamic arp inspection (DAI) and dhcp snooping preventing access to a statically assigned IP.

I tried a few things including building a load balancer using a probeport (similar to the current method of deploying SQL AlwaysOn in Azure) and then load balancing RPC. This starts to work properly, but then fails when it comes to negotiating a dynamic high port. So you could technically load balance all TCP and UDP ports for the CNO and this would work but i haven't tried it and i'm sure it wouldn't be supported if something went pear shaped.

The workaround I've been provided at the moment is to target the commands directly at the node i.e.: get-cluster node1 instead of get-cluster cluster1

Apparently there's a big announcement due in the next few weeks around clustering in Azure which is meant to address this.

gladier avatar Apr 25 '17 07:04 gladier

This makes sense and is exactly the behavior I experienced. I ended up using that exact workaround after trial and error but it's good to know that there is a limitation. I left the current code to ensure compatibility when this is no longer a limitation. To address the limitation, I utilize get-clustergroup to test the current code, and if the result is null, I replace the $Name parameter with the owner node name to ensure functionality. Additionally, I added logic to move the cluster group to the primary server if the owner node is not the 'primary' server. I encountered the scenario where the owner node has changed to the 'secondary' server, which causes the script to fail.

shawntierney avatar Apr 26 '17 16:04 shawntierney

Just an FYI, I encountered a similar issue with the xSQLAvailabilityGroup resource. I only mention it here because the xSQLAvailabilityGroup resource follows the xCluster resource when creating a SQL Availability Group. Similar to the Azure cluster owner node issue, availability group creation will fail if the 'primary' server (01) where the code is executed is not the cluster node owner. The move-clustergroup logic added to xCluster seems to mitigate this issue. However, I did encounter one issue where the AG configuration still failed due to the 'secondary' server (02) being set as the Primary in SQL. My assumption is that this was set during prior testing when an Availability Group was configured on the 'secondary' node and therefore will not be a common occurrence. None-the-less, it's worth noting.

shawntierney avatar Apr 26 '17 16:04 shawntierney

+1 to @gladier and @shawntierney's findings. I was able to join a cluster/availability group via Node B when referencing the cluster as Node A, rather than the CNO.

glennmate avatar May 15 '17 02:05 glennmate

@glennmate what is the code you used to join node b to the cluster?

TraGicCode avatar Jul 17 '17 02:07 TraGicCode

Same issue cant join the node 2 to the cluster node1. Error:- RPC Server not available

dead8171 avatar Oct 04 '17 09:10 dead8171

I have tested everything now, and also getting different errors every time I test something different, When I assigned the security policy to everyone on the Cluster DNS record, after that I was able to connect to the cluster from node2, earlier I was not able to connect to the cluster by specifying the Cluster name and neither Can I connect to it by specifying cluster and its domain name. When I thought of re-run the script only for node 2 I got an Error saying the node is already joined to the cluster, However it was not joined to any of it, this confuses me.

dead8171 avatar Oct 04 '17 12:10 dead8171

Same issue here... Please help how to resolve it.

Powershell Cmdlet failed: Check the spelling of the cluster name. Otherwise, there might be a problem ←[0m with your network. Make sure the cluster nodes are turned on and connected to ←[0m the network or contact your network administrator. ←[0m + CategoryInfo : ConnectionError: (:) [], CimException ←[0m + FullyQualifiedErrorId : ClusterRpcConnection,Microsoft.FailoverClusters. ←[0m PowerShell.GetNodeCommand ←[0m + PSComputerName : localhost ←[0m ←[0m The PowerShell DSC resource '[xCluster]DirectResourceAccess' with SourceInfo ←[0m '' threw one or more non-terminating errors while running the ←[0m Test-TargetResource functionality. These errors are logged to the ETW channel ←[0m called Microsoft-Windows-DSC/Operational. Refer to this channel for more ←[0m details. ←[0m + CategoryInfo : InvalidOperation: (root/Microsoft/...gurationMan ←[0m ager:String) [], CimException ←[0m + FullyQualifiedErrorId : NonTerminatingErrorFromProvider ←[0m + PSComputerName : localhost ←[0m [2017-11-01T12:51:05+00:00] FATAL: Chef::Exceptions::PowershellCmdletException: dsc_resource[test-cluster ] (SqlServer::Create_Cluster line 7) had an error: Chef::Exceptions::PowershellCmdletException: Powershell Cmdlet faile d: Check the spelling of the cluster name. Otherwise, there might be a problem with your network. Make sure the cluster nodes are turned on and connected to the network or contact your network administrator.

  • CategoryInfo : ConnectionError: (:) [], CimException
  • FullyQualifiedErrorId : ClusterRpcConnection,Microsoft.FailoverClusters. PowerShell.GetNodeCommand
  • PSComputerName : localhost

The PowerShell DSC resource '[xCluster]DirectResourceAccess' with SourceInfo '' threw one or more non-terminating errors while running the Test-TargetResource functionality. These errors are logged to the ETW channel called Microsoft-Windows-DSC/Operational. Refer to this channel for more details.

  • CategoryInfo : InvalidOperation: (root/Microsoft/...gurationMan ager:String) [], CimException
  • FullyQualifiedErrorId : NonTerminatingErrorFromProvider
  • PSComputerName : localhost ERROR: Failed to execute command on return code 1 ERROR: Bootstrap command returned 1

mohamednazar avatar Nov 01 '17 12:11 mohamednazar

is it possible to use xFailoverCluster (1.8.0.0), when you invoking the DSC actions from the authoring node, or should the configuration be run locally on the nodes which you are going to configure. I'm still thrown with erros like the one described in this thread. Have not been touching CredSSP at all so far. Is still invoking command the only workaround ?

makeitcloudy avatar Nov 06 '17 12:11 makeitcloudy

I am having the same problem. Instead of using xCluster to join nodes to the cluster, I have a Script resource:

            Script JoinExistingCluster
            {
                GetScript = { 
                    return @{ 'Result' = $true }
                }
                SetScript = {
                    $targetNodeName = $env:COMPUTERNAME
                    Add-ClusterNode -Name $targetNodeName -Cluster $using:ClusterOwnerNode
                }
                TestScript = {
                    $targetNodeName = $env:COMPUTERNAME
                    $(Get-ClusterNode -Cluster $using:ClusterOwnerNode).Name -contains $targetNodeName
                }
                DependsOn = "[xWaitForCluster]WaitForCluster"
                PsDscRunAsCredential = $DomainCreds
            }

That seems to work, but it is a bit of a hack.

hansenms avatar Dec 18 '17 16:12 hansenms

for creating the cluster ( New-Cluster....) you used the Script block as well? how does it look? Thanks.

mkokoy2 avatar Mar 05 '18 23:03 mkokoy2

@mkokoy2 I used the xCluster resource to create the cluster. You can find my DSC script here:

https://github.com/hansenms/iac/blob/master/sql-alwayson/DSC/PrepareSQLServer.ps1

hansenms avatar Mar 09 '18 01:03 hansenms

For those of you who haven't seen - Azure has a new(ish) internal load balancer configuration called HA Ports. If you configure a HA Ports rule on an internal load balancer with a probe port similar to below this should work. HA ports load balancers forward all ports which then does away with the limitations above.

This is probably not officially supported since HA ports were targeted towards firewall products so YMMV.

Script CNOProbe {
#This is used for the Azure Load Balancer Probe, setting this on premise has no effect
    GetScript   = {
                    return $true
                }
    SetScript   = {
                    Import-Module FailoverClusters -Verbose:$false
                    $Resource = Get-ClusterResource "Cluster IP Address"
                    Set-ClusterParameter -InputObject $Resource -Name "ProbePort" -Value "59999"
                    Stop-ClusterResource $Resource
                    Start-ClusterResource $Resource
                }
    TestScript  = {
                    Import-Module FailoverClusters -Verbose:$false
                    $Resource = Get-ClusterResource "Cluster IP Address"
                    $ProbePortParams = Get-ClusterParameter -InputObject $Resource -Name ProbePort
                    write-verbose "Found ProbePort $($ProbePortParams.Value) for $($Resource.Name)"
                    if ($ProbePortParams.Value -eq 59999) {
                        write-verbose "Parameters OK"
                        return $true
                        }
                    else {
                        write-verbose "Bad Parameters"
                        return $false
                        }
                }
}

gladier avatar Sep 17 '18 14:09 gladier

I think @gladier is referencing this one Configure High Availability Ports for an internal load balancer.

@gladier we should add the probe property support to a resource too, what resource does it fit in best (existing or new)?

johlju avatar Sep 18 '18 08:09 johlju

@johlju - That's the config i was referring to.

While we could add it to the xCluster DSC resource; this doesn't make it very reusable for other cluster resources - for example SQL Availability Groups.

Looking at the resource parameters I update on a regular basis I would propose something like:

xWaitforClusterResource  MyClusterResource #Used to wait for a particular resource to be created (e.g. SQL Availability Group Listeners)
{
 Name = [String]
 ResourceName = [String]
}
xClusterResourceParameter MyResourceParameter #Used to actually set the parameter
{
 Name = [String]
 ResourceName = [String]
 [HostRecordTTL = [Uint32]]
 [RegisterAllProvidersIP = [bool]]
 [ProbePort = [Uint32]]
 [DependsOn = [string[]]]
}

However this is probably not the best approach as not all parameters apply to all resources. For example; Cluster IP resources hold the ProbePort parameter; and Cluster name resources hold the RegisterAllProvidersIP and HostRecordTTL parameters.

This issue also duplicates/resolves a number of other open issues - #29 #173 and probably #186.

gladier avatar Sep 18 '18 09:09 gladier

I like you proposal, but if a property targets different scopes within a cluster, I think we should have a resource per scope. 🤔

Also, instead og having xWaitForClusterResource, maybe we could make one resource able to wait for different types of artifacts, like cluster, cluster group, cluster resource etc. Making the existing xWaitForCluster obsolete. Or maybe we can leverage WaitForAll, WaitForAny and WaitForSome.

johlju avatar Sep 21 '18 14:09 johlju

Did this issue ever get resolved? I'm having the same problem joining the second node to the cluster. I haven't tried the script block provided by @hansenms yet but that may be my next steps.

dsmithcloud avatar May 06 '19 23:05 dsmithcloud

For those of you who haven't seen - Azure has a new(ish) internal load balancer configuration called HA Ports. If you configure a HA Ports rule on an internal load balancer with a probe port similar to below this should work. HA ports load balancers forward all ports which then does away with the limitations above.

This is probably not officially supported since HA ports were targeted towards firewall products so YMMV.

Script CNOProbe {
#This is used for the Azure Load Balancer Probe, setting this on premise has no effect
    GetScript   = {
                    return $true
                }
    SetScript   = {
                    Import-Module FailoverClusters -Verbose:$false
                    $Resource = Get-ClusterResource "Cluster IP Address"
                    Set-ClusterParameter -InputObject $Resource -Name "ProbePort" -Value "59999"
                    Stop-ClusterResource $Resource
                    Start-ClusterResource $Resource
                }
    TestScript  = {
                    Import-Module FailoverClusters -Verbose:$false
                    $Resource = Get-ClusterResource "Cluster IP Address"
                    $ProbePortParams = Get-ClusterParameter -InputObject $Resource -Name ProbePort
                    write-verbose "Found ProbePort $($ProbePortParams.Value) for $($Resource.Name)"
                    if ($ProbePortParams.Value -eq 59999) {
                        write-verbose "Parameters OK"
                        return $true
                        }
                    else {
                        write-verbose "Bad Parameters"
                        return $false
                        }
                }
}

Hi, when I’m using start-clusterresource, dsc says “Undefined DSC resource start-clusterresource” (https://stackoverflow.com/questions/64068439/dsc-error-failing-to-recognise-a-cmdlet)

Any ideas??

EDIT: This is fixed - no idea how, just decided to to work!

RyanBoud avatar Sep 26 '20 08:09 RyanBoud

First of all, Thank You for Your code! this bug still actual. 7 years, LOL) FailoverClusterDsc 2.1.0 Windows Server 2016 Standard It is impossible to add node to cluster. It is because of some comandlets can not find cluster by specifying its name without domain. For example this comandlet works correctly, because domain specified https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L177 and all of this works incorrectly: https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L83 https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L243 https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L265 https://github.com/dsccommunity/FailoverClusterDsc/blob/faa9aa398ec9211f4104cd30dc4b8889db22c4f2/source/DSCResources/DSC_Cluster/DSC_Cluster.psm1#L379 throwing error that cluster is not found : "msg": "Failed to invoke DSC Test method: Check the spelling of the cluster name. Otherwise, there might be a problem with your network. Make sure the cluster nodes are turned on and connected to the network or contact your network administrator.", as workaround it is possible while configuring cluster before adding node to replace file "C:\\Program Files\\WindowsPowerShell\\Modules\\FailoverClusterDsc\\2.1.0\\DSCResources\\DSC_Cluster\\DSC_Cluster.psm1" with patched one, where in mentioned places $Name variable is replaced with exact cluster name, containing Domain name. Also after replacing $Name in function Test-TargetResource it returns strange value that is neither True nor False, and i removed the whole its code and just return False for calling it while adding node.

usefree avatar Jul 29 '22 07:07 usefree