fleet
fleet copied to clipboard
CLI Troubleshooting and Diagnostics
As an On-call Engineer or Developer, I want to run a comprehensive diagnostic CLI command that helps troubleshoot common issues and generate information for support tickets.
First and foremost, the CLI should gather relevant diagnostic information, like resource statuses, Fleet's k8s events and logs.
Additional, detailed diagnostics could check agent registration, resource dependencies, and communication problems, so that root causes can be identified.
Acceptance Criteria:
-
Ability to generate a diagnostic bundle (logs, resource YAMLs) for offline analysis and sharing with support.
- https://fleet.rancher.io/troubleshooting#where-to-look-for-root-causes-of-issues
- the bundle should snapshot fleet resources over a period of 30s (?)
- k8s events
- also dump the metrics
-
A
fleet troubleshootorfleet doctorcommand acts as the main entry point to validate the current cluster.- Fetch metrics and analyze queue values (are queues healthy)?
- provide a summary of checks performed (pass/fail), highlights potential issues, and suggests corrective actions or further commands for deeper investigation (e.g., "try
kubectl logs -n cattle-fleet-system fleet-agent-<pod-id>on cluster X").