tiflow icon indicating copy to clipboard operation
tiflow copied to clipboard

Syncer(DM): alerting enhancement for `DM sync process exists with error`

Open King-Dylan opened this issue 2 years ago • 1 comments

Is your feature request related to a problem?

We have retry for downstream bad connection.Some time we would got some altering but the task had already resumed.

[2022/08/10 08:07:18.088 +08:00] [INFO] [subtask.go:292] ["unit process returned"] [subtask=ames-syncer-207] [unit=Sync] [stage=Paused] [status="{\"totalEvents\":71594423,\"totalTps\":486,\"recentTps\":241,\"syncerBinlog\":\"(log-bin|000001.019314, 409540520)\",\"binlogType\":\"local\"}"]
[2022/08/10 08:07:18.088 +08:00] [ERROR] [subtask.go:311] ["unit process error"] [subtask=ames-syncer-207] [unit=Sync] ["error information"="{\"ErrCode\":44008,\"ErrClass\":\"schema-tracker\",\"ErrScope\":\"downstream\",\"ErrLevel\":\"high\",\"Message\":\"startLocation: [position: (, 0), gtid-set: ], endLocation: [position: (log-bin|000001.019314, 409542651), gtid-set: ]: cannot parse downstream table schema of `xxx`.`xxxx` to initialize upstream schema `xxxx`.`xxxx` in schema tracker\",\"RawCause\":\"driver: bad connection\"}"]
[2022/08/10 08:07:18.279 +08:00] [INFO] [relay.go:683] ["flush meta finished"] [component="relay log"] [meta="master-uuid = 0-163.000001, relay-binlog = (log-bin.019314, 409682882), relay-binlog-gtid = "]
[2022/08/10 08:07:18.586 +08:00] [INFO] [worker.go:476] ["auto_resume sub task"] [component="worker controller"] [task=ames-syncer-207]
[2022/08/10 08:07:18.586 +08:00] [INFO] [subtask.go:525] ["resume with unit"] [subtask=ames-syncer-207] [unit=Sync]
[2022/08/10 08:07:18.586 +08:00] [INFO] [task_checker.go:401] ["dispatch auto resume task"] [component="task checker"] [task=ames-syncer-207]

Describe the feature you'd like

Because of the retries, it seems that there is no need for such frequent alerts,

Describe alternatives you've considered

No response

Teachability, Documentation, Adoption, Migration Strategy

No response

King-Dylan avatar Aug 12 '22 05:08 King-Dylan

  - alert: DM_sync_process_exists_with_error
    expr: changes(dm_syncer_exit_with_error_count[1m]) > 0
    labels:
      env: ENV_LABELS_ENV
      level: critical
      expr: changes(dm_syncer_exit_with_error_count[1m]) > 0
    annotations:
      description: 'cluster: ENV_LABELS_ENV, instance: {{ $labels.instance }}, task: {{ $labels.task }}, values: {{ $value }}'
      value: '{{ $value }}'
      summary: DM sync process exists with error

King-Dylan avatar Aug 12 '22 05:08 King-Dylan