yugabyte-db icon indicating copy to clipboard operation
yugabyte-db copied to clipboard

[DocDB] During rolling upgrade with workload is running in parallel one of the node becomes unreachable.

Open shantanugupta-yb opened this issue 9 months ago • 0 comments

Jira Link: DB-11135

Description

During rolling upgrade with workload is running in parallel one of the node becomes unreachable. Upgrade 2.18.7.0-b38 >> 2.21.0.0-b509 Upgrade 2.18.7.0-b38 >> 2.21.0.0-b504

On both the cluster, the N1 node is unreachable(Blue line in the attached snapshot). From the snapshot it can be seen that the rolling upgrade follows the sequence of N1>>N2>>N3. During rolling upgrade the connections of the node undergoing upgrade is distributed amongst the remaining nodes but when the final node N3 undergoes the rolling upgrade process it is seen that the distribution of connections is not equal amongst N1(229) and N2(75). Most likely due to 229 connections on N1 the node seems to be becoming unresponsive/unreachable.

Also observed that the unreachable AWS VM had CPU utilisation was pegged at 99+%.

image image

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • [X] I confirm this issue does not contain any sensitive information.

shantanugupta-yb avatar May 01 '24 17:05 shantanugupta-yb