diag
diag copied to clipboard
Improve the error message when PD is not accessible
Bug Report
Please answer these questions before submitting your issue. Thanks!
- What did you do?
"PD context deadline exceeded" when running diag:
2022-11-22T03:23:12.790Z DEBUG Dumping metric workqueue_work_duration_seconds_count-2022-11-21T23:19:53Z-2022-11-22T02:19:53Z... 2022-11-22T03:23:12.815Z DEBUG Dumped metric workqueue_work_duration_seconds_bucket from 2022-11-21T23:19:53Z to 2022-11-22T00:19:53Z (2276399 bytes) 2022-11-22T03:23:12.896Z DEBUG Dumped metric workqueue_work_duration_seconds_count from 2022-11-22T00:19:53Z to 2022-11-22T02:19:53Z (364470 bytes) 2022-11-22T03:23:12.909Z DEBUG Dumped metric workqueue_work_duration_seconds_count from 2022-11-21T23:19:53Z to 2022-11-22T00:19:53Z (213210 bytes) 2022-11-22T03:23:14.089Z DEBUG Dumped metric spring_batch_step_seconds_sum from 2022-11-21T23:27:53Z to 2022-11-21T23:45:05Z (41167601 bytes) 2022-11-22T03:23:15.502Z DEBUG Dumped metric spring_batch_step_seconds_sum from 2022-11-21T23:19:53Z to 2022-11-21T23:27:53Z (31513193 bytes) 2022-11-22T03:23:15.502Z ERROR metadata of the cluster: Get "http://172.20.53.108:2379/pd/api/v1/cluster": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2022-11-22T03:23:15.502Z INFO Collected data are stored in ESC[36m/diag/diag-ttt-g13tgnMFGyCESC[0m
This is at the end of an execution of "diag util metricdump". The directory mentioned in the logs has incomplete information, and when the customer tries to upload the data, they get an error: "${DIRECTORY} is not a diag collected data directory".
- If this is a fatal error, then diag should not log the last line, as users may see this last line and think that the execution was successful.
- We should retry this failed operation, or at least have an option to retry it.
-
What did you expect to see? An error message like: " Failed to get cluster id and it will impact uploading.Please check your pd connection"
-
What did you see instead?
-
What version of Diag are you using (
tiup diag --version
)?