terraform-provider-rancher2 icon indicating copy to clipboard operation
terraform-provider-rancher2 copied to clipboard

[BUG] Scaling up nodes on downstream RKE1 cluster causes cluster (intermittently) to "hang" indefinitely

Open Josh-Diamond opened this issue 2 years ago • 3 comments

Rancher Server Setup

  • Rancher version: v2.6.12-rc1
  • Installation option (Docker install/Helm Chart): HA Helm w/ RKE1 local and RKE v1.3.19
  • Proxy/Cert Details: byo-valid

Information about the Cluster

  • Kubernetes version: v1.24.10-rancher4-1
  • Cluster Type (Local/Downstream): Downstream EC2 RKE1 w/ individual roles -[1 etcd, 1 cp, 1 wkr.. then scale to 3 etcd, 2 cp, 3 wkr

User Information

  • What is the role of the user logged in? Admin

Provider Information

  • What is the version of the Rancher v2 Terraform Provider in use? 2.0.0
  • What is the version of Terraform in use? 0.13.7

Describe the bug

When provisioning a downstream EC2 RKE1 cluster w/ individual roles, the cluster successfully provisions. Attempting to then scale up the nodes, sometimes results in the cluster hanging, indefinitely. This is not seen via Rancher UI. (I was only able to encounter this when using rancher2 provider)

To Reproduce

  1. Fresh install of rancher v2.6.12-rc1
  2. Using rancher2 TFP 2.0.0, provision a downstream EC2 RKE1 cluster, v1.24.10-rancher4-1, w/ 1 etcd, 1 cp, and 1 wkr
  3. Once active, scale up nodes (via TF) to 3 etcd, 2 cp, 3 wkr
  4. Reproduced

Actual Result

cluster hangs indefinitely, scale up never achieved

Expected Result

cluster expected to scale up nodes successfully

Screenshots

Cluster Management Screenshot 2023-04-19 at 11 00 04 AM

Provisioning logs Screenshot 2023-04-19 at 10 56 58 AM

Additional context

Its possible this affects RKE1 across multiple providers, but initially seen w/ EC2. I will attempt to reproduce w/ Linode and confirm shortly (in comment below) if that is affected as well.

Josh-Diamond avatar Apr 19 '23 18:04 Josh-Diamond

issue seen w/ Linode as well - [not EC2 specific]

Josh-Diamond avatar Apr 19 '23 19:04 Josh-Diamond

@Josh-Diamond Do you only see this when prov TF clusters or also via the UI?

a-blender avatar Jul 27 '23 20:07 a-blender

I will work on reproducing this issue.

a-blender avatar Oct 20 '23 19:10 a-blender