dm icon indicating copy to clipboard operation
dm copied to clipboard

Support syncing when disk is very slow

Open lance6716 opened this issue 4 years ago • 2 comments

Feature Request

Is your feature request related to a problem? Please describe:

A user deploy DM in an environment that has very slow disk. that has help revealing some BUG of DM such as

  • [ ] https://github.com/pingcap/dm/issues/1377
  • [ ] worker recieved a bound watch, but failed to read bound information in etcd and didn't retry or kill itself
  • [ ] query-status shows nothing, while can't add task because of already exists (not enough information in log)

Describe the feature you'd like:

  • [ ] expose more etcd error and metrics (already in https://github.com/pingcap/dm/issues/1219, https://github.com/pingcap/dm/issues/1218), and warn when disk is bad
  • [ ] test DM in slow disk

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

lance6716 avatar Jan 18 '21 10:01 lance6716

I have used chaosmesh to try imitate a bad disk environment, but not effective to reveal bugs. We might try use failpoint with percent probability to inject into etcd API, and check if it will cause inconsistency in DM.

@zeminzhou

lance6716 avatar Feb 04 '21 11:02 lance6716

(removed the BUG label because we need further investigating if it's has been fixed)

lance6716 avatar Apr 09 '21 01:04 lance6716