vitess
vitess copied to clipboard
vtorc/vttablet: can't downgrade from v20 to v19
Overview of the Issue
We're seeing vttablet OOM incredibly fast on v20.0.0 for some reason, after running fine for a couple weeks. We attempted to downgrade to v19.0.4 to see if that changed anything, but the primary was unable to start. vtorc attempted to recover UndoDemotePrimary
and couldn't ever succeed.
SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0) failed: Unknown system variable 'rpl_semi_sync_master_enabled'
When I reverted that change back to v20.0.0, vtorc was able to successfully run UndoDemotePrimary
Related issues:
- https://github.com/vitessio/vitess/issues/12869
Reproduction Steps
This was tested on a single node keyspace with only a single tablet.
Binary Version
v20.0.0 for most components
downgrading vttablet to v19.0.4
Operating System and Environment details
GKE v1.29
Log Fragments
I0628 23:33:10.017377 1 vtorc.go:205] Force discovered - &{Hostname:uscentral1-keywordsu-0-replica-c-0.vttablet Port:3306 InstanceAlias:uscentral1-0288422300 ServerID:1511431603 ServerUUID:e99beb6f-5ea1-11ee-9732-76800cdb5542 Version:8.0.36 VersionComment:Percona Server (GPL), Release '28', Revision '47601f19'$ FlavorName: ReadOnly:true BinlogFormat:ROW BinlogRowImage:FULL LogBinEnabled:true LogReplicationUpdatesEnabled:true SelfBinlogCoordinates:vt-0288422300-bin.033393:317 SourceHost: SourcePort:0 SourceUUID: AncestryUUID:e99beb6f-5ea1-11ee-9732-76800cdb5542 ReplicaNetTimeout:0 HeartbeatInterval:0 ReplicationSQLThreadRuning:false ReplicationIOThreadRuning:false ReplicationSQLThreadState:-1 ReplicationIOThreadState:-1 HasReplicationFilters:false GTIDMode:ON SupportsOracleGTID:true UsingOracleGTID:false UsingMariaDBGTID:false UsingPseudoGTID:false ReadBinlogCoordinates::0 ExecBinlogCoordinates::0 IsDetached:false RelaylogCoordinates::0 LastSQLError: LastIOError: SecondsBehindPrimary:{Int64:0 Valid:false} SQLDelay:0 ExecutedGtidSet:0c4b8ef3-253a-11ec-98d9-3235e408d1e5:1-609569,14e91046-1cdc-11ed-a65f-d60bf275c515:1-1589772,28d7debe-6d30-11ec-9696-826bd9c444b9:1-32830237,4dda5b4e-0ea6-11ed-a83b-c2d1e4afc67b:1-111205389,684e23fe-0eb4-11ed-a604-b6763a422c36:1-3484661,a351f01d-fb7d-11ed-a660-bee2b573f3bf:1-39363505,a5ef9c7b-0eac-11ed-8bbe-46781259a843:1-75497684,a8de8790-6c23-11ed-af0a-62dad3b25097:1-71206353,bd907242-5ea4-11ee-b501-aa7ed7190bc9:1-2490747,cd9e16c9-2539-11ec-9499-2a7ac907fc94:1-282508005,e99beb6f-5ea1-11ee-9732-76800cdb5542:1-1350063514 GtidPurged:0c4b8ef3-253a-11ec-98d9-3235e408d1e5:1-609569,14e91046-1cdc-11ed-a65f-d60bf275c515:1-1589772,28d7debe-6d30-11ec-9696-826bd9c444b9:1-32830237,4dda5b4e-0ea6-11ed-a83b-c2d1e4afc67b:1-111205389,684e23fe-0eb4-11ed-a604-b6763a422c36:1-3484661,a351f01d-fb7d-11ed-a660-bee2b573f3bf:1-39363505,a5ef9c7b-0eac-11ed-8bbe-46781259a843:1-75497684,a8de8790-6c23-11ed-af0a-62dad3b25097:1-71206353,bd907242-5ea4-11ee-b501-aa7ed7190bc9:1-2490747,cd9e16c9-2539-11ec-9499-2a7ac907fc94:1-282508005,e99beb6f-5ea1-11ee-9732-76800cdb5542:1-1345434303 GtidErrant: primaryExecutedGtidSet: ReplicationLagSeconds:{Int64:0 Valid:false} DataCenter:uscentral1 Region: PhysicalEnvironment: ReplicationDepth:0 IsCoPrimary:false HasReplicationCredentials:false SemiSyncEnforced:false SemiSyncPrimaryEnabled:false SemiSyncReplicaEnabled:false SemiSyncPrimaryTimeout:0 SemiSyncPrimaryWaitForReplicaCount:0 SemiSyncPrimaryStatus:false SemiSyncPrimaryClients:0 SemiSyncReplicaStatus:false LastSeenTimestamp: IsLastCheckValid:true IsUpToDate:true IsRecentlyChecked:true SecondsSinceLastSeen:{Int64:0 Valid:false} AllowTLS:false Problems:[] LastDiscoveryLatency:10.705519ms}, err - <nil>
I0628 23:33:10.017476 1 locks.go:455] Unlocking shard keywordsu/0 for action VTOrc Recovery for PrimaryIsReadOnly on uscentral1-0288422300 with error rpc error: code = Unknown desc = TabletManager.UndoDemotePrimary on uscentral1-0288422300: can't set semi-sync mode: ExecuteFetch(SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0) failed: Unknown system variable 'rpl_semi_sync_master_enabled' (errno 1193) (sqlstate HY000) during query: SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0; make sure plugins are loaded in my.cnf
E0628 23:33:10.022063 1 topology_recovery.go:654] rpc error: code = Unknown desc = TabletManager.UndoDemotePrimary on uscentral1-0288422300: can't set semi-sync mode: ExecuteFetch(SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0) failed: Unknown system variable 'rpl_semi_sync_master_enabled' (errno 1193) (sqlstate HY000) during query: SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0; make sure plugins are loaded in my.cnf