tiflow
tiflow copied to clipboard
Syncer(DM): alerting enhancement for `DM sync process exists with error`
Is your feature request related to a problem?
We have retry for downstream bad connection.Some time we would got some altering but the task had already resumed.
[2022/08/10 08:07:18.088 +08:00] [INFO] [subtask.go:292] ["unit process returned"] [subtask=ames-syncer-207] [unit=Sync] [stage=Paused] [status="{\"totalEvents\":71594423,\"totalTps\":486,\"recentTps\":241,\"syncerBinlog\":\"(log-bin|000001.019314, 409540520)\",\"binlogType\":\"local\"}"]
[2022/08/10 08:07:18.088 +08:00] [ERROR] [subtask.go:311] ["unit process error"] [subtask=ames-syncer-207] [unit=Sync] ["error information"="{\"ErrCode\":44008,\"ErrClass\":\"schema-tracker\",\"ErrScope\":\"downstream\",\"ErrLevel\":\"high\",\"Message\":\"startLocation: [position: (, 0), gtid-set: ], endLocation: [position: (log-bin|000001.019314, 409542651), gtid-set: ]: cannot parse downstream table schema of `xxx`.`xxxx` to initialize upstream schema `xxxx`.`xxxx` in schema tracker\",\"RawCause\":\"driver: bad connection\"}"]
[2022/08/10 08:07:18.279 +08:00] [INFO] [relay.go:683] ["flush meta finished"] [component="relay log"] [meta="master-uuid = 0-163.000001, relay-binlog = (log-bin.019314, 409682882), relay-binlog-gtid = "]
[2022/08/10 08:07:18.586 +08:00] [INFO] [worker.go:476] ["auto_resume sub task"] [component="worker controller"] [task=ames-syncer-207]
[2022/08/10 08:07:18.586 +08:00] [INFO] [subtask.go:525] ["resume with unit"] [subtask=ames-syncer-207] [unit=Sync]
[2022/08/10 08:07:18.586 +08:00] [INFO] [task_checker.go:401] ["dispatch auto resume task"] [component="task checker"] [task=ames-syncer-207]
Describe the feature you'd like
Because of the retries, it seems that there is no need for such frequent alerts,
Describe alternatives you've considered
No response
Teachability, Documentation, Adoption, Migration Strategy
No response
- alert: DM_sync_process_exists_with_error
expr: changes(dm_syncer_exit_with_error_count[1m]) > 0
labels:
env: ENV_LABELS_ENV
level: critical
expr: changes(dm_syncer_exit_with_error_count[1m]) > 0
annotations:
description: 'cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, task: {{ $labels.task }}, values: {{ $value }}'
value: '{{ $value }}'
summary: DM sync process exists with error