QA: Can we use Seconds_Behind_Master(MariaDB 10.5.x) for ReplicationLagQuery?
Hi, @shlomi-noach.
As described in https://github.com/openark/orchestrator/blob/master/docs/configuration-recovery.md#promotion-actions
FailMasterPromotionOnLagMinutes: defaults0(not failing promotion). Can be used to fail a promotion if the candidate replica is too far behind. Example: replicas were broken for 5 hours, and then master failed. One might want to prevent the failover in order to recover the binary logs / relay logs for those lost 5 hours. To use this flag, you must setReplicationLagQueryand use a heartbeat mechanism such aspt-heartbeat. The MySQL built-inSeconds_behind_masteroutput ofSHOW SLAVE STATUS(pre 8.0) does not report replication lag when replication is broken.
I wonder, can we use Seconds_Behind_Master for ReplicationLagQuery to determine replication lagging?
I currently use MariaDB 10.5.11(Release date: 23 Jun 2021)
see: https://mariadb.com/kb/en/show-replica-status/#column-descriptions
I wonder, can we use Seconds_Behind_Master for ReplicationLagQuery to determine replication lagging?
If you don't specify ReplicationLagQuery, then orchestrator uses Seconds_Behind_Master by default.
Thanks for your reply!
One more thing, is Seconds_Behind_Master reliable to determine replication lagging?
I saw some blog posts that said that it's not reliable to use(better to use pt-heartbeat).
Sorry, I'm a newbie.
It is not reliable in my experience. See http://code.openark.org/blog/mysql/seconds_behind_master-vs-absolute-slave-lag
Thanks! https://code.openark.org/blog/mysql/seconds_behind_master-vs-absolute-slave-lag
I wonder, can we use Seconds_Behind_Master for ReplicationLagQuery to determine replication lagging?
If you don't specify
ReplicationLagQuery, thenorchestratorusesSeconds_Behind_Masterby default.
Aug 17 11:57:22 a.test.com orchestrator[3040]: 2021-08-17 11:57:22 FATAL nonzero FailMasterPromotionOnLagMinutes requires ReplicationLagQuery to be set
https://github.com/openark/orchestrator/blob/master/go/config/config.go#L544 https://github.com/openark/orchestrator/blob/master/docs/using-the-web-api.md
It's not true, I'm using orchestrator 3.2.6 and MariaDB 10.5.12.
Seems that I cannot extract Seconds_Behind_Master solely from the output of SHOW SLAVE STATUS (As of MariaDB 10.5.12)
~~SELECT t.Seconds_Behind_Master FROM (SHOW SLAVE STATUS) AS t;~~ won't work.
[MDEV-11123] Seconds_Behind_Master is not accessible through information_schema - Jira
According to https://github.com/openark/orchestrator/blob/master/go/inst/instance_dao.go#L640-L654
if config.Config.ReplicationLagQuery != "" && !isMaxScale {
waitGroup.Add(1)
go func() {
defer waitGroup.Done()
if err := db.QueryRow(config.Config.ReplicationLagQuery).Scan(&instance.ReplicationLagSeconds); err == nil {
if instance.ReplicationLagSeconds.Valid && instance.ReplicationLagSeconds.Int64 < 0 {
log.Warningf("Host: %+v, instance.SlaveLagSeconds < 0 [%+v], correcting to 0", instanceKey, instance.ReplicationLagSeconds.Int64)
instance.ReplicationLagSeconds.Int64 = 0
}
} else {
instance.ReplicationLagSeconds = instance.SecondsBehindMaster
logReadTopologyInstanceError(instanceKey, "ReplicationLagQuery", err)
}
}()
}
If we fail db.QueryRow(config.Config.ReplicationLagQuery).Scan(&instance.ReplicationLagSeconds) deliberately, ReplicationLagSeconds will use SecondsBehindMaster as fallback, which is desired.
However, the method aforementioned has one side-effect that, it'll do a harmless error log.
"ReplicationLagQuery": "SELECT 'see: https://github.com/openark/orchestrator/issues/1388#issuecomment-900014232'",
2021-08-18 11:58:31 ERROR ReadTopologyInstance(mariadb-13307:3306) ReplicationLagQuery: sql: Scan error on column index 0, name "see: https://github.com/openark/orchestrator/issues/1388#issuecomment-900014232": converting driver.Value type []uint8 ("see: https://github.com/openark/orchestrator/issues/1388#issuecomment-900014232") to a int64: invalid syntax
also, the code may be subject to change.
I wonder, can we use Seconds_Behind_Master for ReplicationLagQuery to determine replication lagging?
If you don't specify
ReplicationLagQuery, thenorchestratorusesSeconds_Behind_Masterby default.Aug 17 11:57:22 a.test.com orchestrator[3040]: 2021-08-17 11:57:22 FATAL nonzero
FailMasterPromotionOnLagMinutesrequiresReplicationLagQueryto be sethttps://github.com/openark/orchestrator/blob/master/go/config/config.go#L544 https://github.com/openark/orchestrator/blob/master/docs/using-the-web-api.md
It's not true, I'm using orchestrator 3.2.6 and MariaDB 10.5.12.
@shlomi-noach, can we propose a way to indicate ReplicationLagQuery to use Seconds_Behind_Master directly? instead of the hacking way.
For example: "ReplicationLagQuery": "-- Seconds_Behind_Master", to use Seconds_Behind_Master from SHOW SLAVE STATUS directly.