vitess icon indicating copy to clipboard operation
vitess copied to clipboard

vtorc/vttablet: can't downgrade from v20 to v19

Open derekperkins opened this issue 2 weeks ago • 0 comments

Overview of the Issue

We're seeing vttablet OOM incredibly fast on v20.0.0 for some reason, after running fine for a couple weeks. We attempted to downgrade to v19.0.4 to see if that changed anything, but the primary was unable to start. vtorc attempted to recover UndoDemotePrimary and couldn't ever succeed.

SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0) failed: Unknown system variable 'rpl_semi_sync_master_enabled'

When I reverted that change back to v20.0.0, vtorc was able to successfully run UndoDemotePrimary

Related issues:

  • https://github.com/vitessio/vitess/issues/12869

Reproduction Steps

This was tested on a single node keyspace with only a single tablet.

Binary Version

v20.0.0 for most components
downgrading vttablet to v19.0.4

Operating System and Environment details

GKE v1.29

Log Fragments

I0628 23:33:10.017377       1 vtorc.go:205] Force discovered - &{Hostname:uscentral1-keywordsu-0-replica-c-0.vttablet Port:3306 InstanceAlias:uscentral1-0288422300 ServerID:1511431603 ServerUUID:e99beb6f-5ea1-11ee-9732-76800cdb5542 Version:8.0.36 VersionComment:Percona Server (GPL), Release '28', Revision '47601f19'$ FlavorName: ReadOnly:true BinlogFormat:ROW BinlogRowImage:FULL LogBinEnabled:true LogReplicationUpdatesEnabled:true SelfBinlogCoordinates:vt-0288422300-bin.033393:317 SourceHost: SourcePort:0 SourceUUID: AncestryUUID:e99beb6f-5ea1-11ee-9732-76800cdb5542 ReplicaNetTimeout:0 HeartbeatInterval:0 ReplicationSQLThreadRuning:false ReplicationIOThreadRuning:false ReplicationSQLThreadState:-1 ReplicationIOThreadState:-1 HasReplicationFilters:false GTIDMode:ON SupportsOracleGTID:true UsingOracleGTID:false UsingMariaDBGTID:false UsingPseudoGTID:false ReadBinlogCoordinates::0 ExecBinlogCoordinates::0 IsDetached:false RelaylogCoordinates::0 LastSQLError: LastIOError: SecondsBehindPrimary:{Int64:0 Valid:false} SQLDelay:0 ExecutedGtidSet:0c4b8ef3-253a-11ec-98d9-3235e408d1e5:1-609569,14e91046-1cdc-11ed-a65f-d60bf275c515:1-1589772,28d7debe-6d30-11ec-9696-826bd9c444b9:1-32830237,4dda5b4e-0ea6-11ed-a83b-c2d1e4afc67b:1-111205389,684e23fe-0eb4-11ed-a604-b6763a422c36:1-3484661,a351f01d-fb7d-11ed-a660-bee2b573f3bf:1-39363505,a5ef9c7b-0eac-11ed-8bbe-46781259a843:1-75497684,a8de8790-6c23-11ed-af0a-62dad3b25097:1-71206353,bd907242-5ea4-11ee-b501-aa7ed7190bc9:1-2490747,cd9e16c9-2539-11ec-9499-2a7ac907fc94:1-282508005,e99beb6f-5ea1-11ee-9732-76800cdb5542:1-1350063514 GtidPurged:0c4b8ef3-253a-11ec-98d9-3235e408d1e5:1-609569,14e91046-1cdc-11ed-a65f-d60bf275c515:1-1589772,28d7debe-6d30-11ec-9696-826bd9c444b9:1-32830237,4dda5b4e-0ea6-11ed-a83b-c2d1e4afc67b:1-111205389,684e23fe-0eb4-11ed-a604-b6763a422c36:1-3484661,a351f01d-fb7d-11ed-a660-bee2b573f3bf:1-39363505,a5ef9c7b-0eac-11ed-8bbe-46781259a843:1-75497684,a8de8790-6c23-11ed-af0a-62dad3b25097:1-71206353,bd907242-5ea4-11ee-b501-aa7ed7190bc9:1-2490747,cd9e16c9-2539-11ec-9499-2a7ac907fc94:1-282508005,e99beb6f-5ea1-11ee-9732-76800cdb5542:1-1345434303 GtidErrant: primaryExecutedGtidSet: ReplicationLagSeconds:{Int64:0 Valid:false} DataCenter:uscentral1 Region: PhysicalEnvironment: ReplicationDepth:0 IsCoPrimary:false HasReplicationCredentials:false SemiSyncEnforced:false SemiSyncPrimaryEnabled:false SemiSyncReplicaEnabled:false SemiSyncPrimaryTimeout:0 SemiSyncPrimaryWaitForReplicaCount:0 SemiSyncPrimaryStatus:false SemiSyncPrimaryClients:0 SemiSyncReplicaStatus:false LastSeenTimestamp: IsLastCheckValid:true IsUpToDate:true IsRecentlyChecked:true SecondsSinceLastSeen:{Int64:0 Valid:false} AllowTLS:false Problems:[] LastDiscoveryLatency:10.705519ms}, err - <nil>
I0628 23:33:10.017476       1 locks.go:455] Unlocking shard keywordsu/0 for action VTOrc Recovery for PrimaryIsReadOnly on uscentral1-0288422300 with error rpc error: code = Unknown desc = TabletManager.UndoDemotePrimary on uscentral1-0288422300: can't set semi-sync mode: ExecuteFetch(SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0) failed: Unknown system variable 'rpl_semi_sync_master_enabled' (errno 1193) (sqlstate HY000) during query: SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0; make sure plugins are loaded in my.cnf
E0628 23:33:10.022063       1 topology_recovery.go:654] rpc error: code = Unknown desc = TabletManager.UndoDemotePrimary on uscentral1-0288422300: can't set semi-sync mode: ExecuteFetch(SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0) failed: Unknown system variable 'rpl_semi_sync_master_enabled' (errno 1193) (sqlstate HY000) during query: SET GLOBAL rpl_semi_sync_master_enabled = 0, GLOBAL rpl_semi_sync_slave_enabled = 0; make sure plugins are loaded in my.cnf

derekperkins avatar Jun 28 '24 23:06 derekperkins