pd icon indicating copy to clipboard operation
pd copied to clipboard

`clock offset` is not clear enough

Open rleungx opened this issue 1 year ago • 2 comments

Enhancement Task

func (t *timestampOracle) UpdateTimestamp() error {
	prevPhysical, prevLogical := t.getTSO()

	now := time.Now()
	t.metrics.saveEvent.Inc()

	jetLag := typeutil.SubRealTimeByWallClock(now, prevPhysical)
	if jetLag > 3*t.updatePhysicalInterval && jetLag > jetLagWarningThreshold {
		log.Warn("clock offset",
			logutil.CondUint32("keyspace-group-id", t.keyspaceGroupID, t.keyspaceGroupID > 0),
			zap.Duration("jet-lag", jetLag),
			zap.Time("prev-physical", prevPhysical),
			zap.Time("now", now),
			zap.Duration("update-physical-interval", t.updatePhysicalInterval))
		t.metrics.slowSaveEvent.Inc()
	}
        ...

From the above code, if the current system time is much later than the previous physical time or runtime issue, the log will be printed. But clock offset could be one of the reasons. So here, we'd better use a clearer log message that is less confusing.

rleungx avatar Nov 13 '24 02:11 rleungx

clock offset could be one of the reasons

What other reasons need to be explained? PD restart? Leader transfer?

Or is it sufficient to just mention that there hasn't been a physical time update for a while, which may have caused a clock offset?

@JmPotato @rleungx

okJiang avatar Nov 20 '24 06:11 okJiang

clock offset could be one of the reasons

What other reasons need to be explained? PD restart? Leader transfer?

Or is it sufficient to just mention that there hasn't been a physical time update for a while, which may have caused a clock drift?

@JmPotato @rleungx

If the etcd suffers from a slow IO performance, the TSO updating may fail to advance the physical part, which will also cause the "clock offset".

JmPotato avatar Nov 20 '24 06:11 JmPotato