yugabyte-db icon indicating copy to clipboard operation
yugabyte-db copied to clipboard

[Upgrade] QLRU 12 DBs Upgrade from 2.18.4 to 2024.1.0.0-b123 causes master leader to go unreachable and throughput never comes back

Open shamanthchandra-yb opened this issue 9 months ago • 0 comments

Jira Link: DB-11181

Description

Please find detailed conversation in slack thread. Attaching in JIRA.

I upgraded 2 universes from version 2.18.4 to 2024.1. Both times, the master leader became an unreachable node. Even after bringing it back live by stopping and starting from the AWS console, the dropped connections never returned. Interestingly, this didn’t happen with our 18DBs in our before experiments, whereas here in 12 DBs we hit this, where workload is lighter than previous.

The difference was 18 DBs test was upgraded to 2024.1.0.0-b105 and 12 DBs was to 2024.1.0.0-b123. Earlier, the better defaults from Mark existed, which is now under gflag and by default off, from 2024.1.0.0-b116. https://phorge.dev.yugabyte.com/D34565. Looks like the better defaults was masking this issue.

Regarding the upgrade from 2.18.4 -> 2.20.3 with 12 DBs, I didn’t observe the above issue; none of the nodes became unavailable. Like in our previous experiments, I don’t observe a throughput drop in this experiment too.

Screenshot 2024-05-04 at 12 42 07 AM Screenshot 2024-05-04 at 12 41 50 AM

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • [X] I confirm this issue does not contain any sensitive information.

shamanthchandra-yb avatar May 04 '24 16:05 shamanthchandra-yb