yugabyte-db
yugabyte-db copied to clipboard
[DocDB] During rolling upgrade with workload is running in parallel one of the node becomes unreachable.
Jira Link: DB-11135
Description
During rolling upgrade with workload is running in parallel one of the node becomes unreachable. Upgrade 2.18.7.0-b38 >> 2.21.0.0-b509 Upgrade 2.18.7.0-b38 >> 2.21.0.0-b504
On both the cluster, the N1 node is unreachable(Blue line in the attached snapshot). From the snapshot it can be seen that the rolling upgrade follows the sequence of N1>>N2>>N3. During rolling upgrade the connections of the node undergoing upgrade is distributed amongst the remaining nodes but when the final node N3 undergoes the rolling upgrade process it is seen that the distribution of connections is not equal amongst N1(229) and N2(75). Most likely due to 229 connections on N1 the node seems to be becoming unresponsive/unreachable.
Also observed that the unreachable AWS VM had CPU utilisation was pegged at 99+%.
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
- [X] I confirm this issue does not contain any sensitive information.