terraform-provider-rancher2 [BUG] Scaling up nodes on downstream RKE1 cluster causes cluster (intermittently) to "hang" indefinitely

[BUG] Scaling up nodes on downstream RKE1 cluster causes cluster (intermittently) to "hang" indefinitely

Open Josh-Diamond opened this issue 2 years ago • 3 comments

Rancher Server Setup

Rancher version: v2.6.12-rc1
Installation option (Docker install/Helm Chart): HA Helm w/ RKE1 local and RKE v1.3.19
Proxy/Cert Details: byo-valid

Information about the Cluster

Kubernetes version: v1.24.10-rancher4-1
Cluster Type (Local/Downstream): Downstream EC2 RKE1 w/ individual roles -[1 etcd, 1 cp, 1 wkr.. then scale to 3 etcd, 2 cp, 3 wkr

User Information

What is the role of the user logged in? Admin

Provider Information

What is the version of the Rancher v2 Terraform Provider in use? 2.0.0
What is the version of Terraform in use? 0.13.7

Describe the bug

When provisioning a downstream EC2 RKE1 cluster w/ individual roles, the cluster successfully provisions. Attempting to then scale up the nodes, sometimes results in the cluster hanging, indefinitely. This is not seen via Rancher UI. (I was only able to encounter this when using rancher2 provider)

To Reproduce

Fresh install of rancher v2.6.12-rc1
Using rancher2 TFP 2.0.0, provision a downstream EC2 RKE1 cluster, v1.24.10-rancher4-1, w/ 1 etcd, 1 cp, and 1 wkr
Once active, scale up nodes (via TF) to 3 etcd, 2 cp, 3 wkr
Reproduced

Actual Result

cluster hangs indefinitely, scale up never achieved

Expected Result

cluster expected to scale up nodes successfully

Screenshots

Cluster Management Screenshot 2023-04-19 at 11 00 04 AM

Provisioning logs Screenshot 2023-04-19 at 10 56 58 AM

Additional context

Its possible this affects RKE1 across multiple providers, but initially seen w/ EC2. I will attempt to reproduce w/ Linode and confirm shortly (in comment below) if that is affected as well.

Apr 19 '23 18:04 Josh-Diamond

issue seen w/ Linode as well - [not EC2 specific]

Apr 19 '23 19:04 Josh-Diamond

@Josh-Diamond Do you only see this when prov TF clusters or also via the UI?

Jul 27 '23 20:07 a-blender

I will work on reproducing this issue.

Oct 20 '23 19:10 a-blender

terraform-provider-rancher2 terraform-provider-rancher2 copied to clipboard

[BUG] Scaling up nodes on downstream RKE1 cluster causes cluster (intermittently) to "hang" indefinitely

Rancher Server Setup

Information about the Cluster

User Information

Provider Information

Describe the bug

To Reproduce

Actual Result

Expected Result

Screenshots

Additional context

terraform-provider-rancher2
terraform-provider-rancher2 copied to clipboard