[Feature] Documentation on troubleshooting Fleet Manager
Is your feature request related to a problem? Please describe.
If you deploy something to Fleet Manager that fails to replicate for some reason (e.g., maybe it's valid by the CRD schema but an admission controller in a member cluster rejects it) it's hard to see when things got replicated, what the status is, etc. There's no documentation explaining how to troubleshoot issues like this, or how to see Fleet Manager logs to determine if it tried to replicate something and possibly failed for some reason.
Describe the solution you'd like
Documentation that explains how to do things like:
- Get the logs for Fleet Manager replication (
ClusterResourcePlacementexecuting updates in members, for example). - Troubleshoot potentially common replication issues (deploy to
hubcluster and member cluster rejects it due to an admission controller) - Somehow check replication status across member clusters to see if everything is in sync.
Describe alternatives you've considered
There really aren't alternatives. The docs on internals are pretty thin and the logs that get sent to Log Analytics don't seem to have this information, at least not that I can find. It's pretty hard to reverse engineer what's going on here.
-
(In progress) Get the logs for Fleet Manager replication (ClusterResourcePlacement executing updates in members, for example).
-
(DONE) Troubleshoot potentially common replication issues (deploy to hub cluster and member cluster rejects it due to an admission controller)
- Troubleshooting guide (Azure): https://learn.microsoft.com/en-us/troubleshoot/azure/kubernetes-fleet/troubleshoot-clusterresourceplacement-api-issues
- Troubleshooting guide (Upstream): https://github.com/Azure/fleet/tree/main/docs/troubleshooting
-
(DONE) Somehow check replication status across member clusters to see if everything is in sync.
- You can see the replication status of each target member cluster in the status of ClusterResourcePlacement.
Link in the Fleet Manager table of contents at the bottom ("Troubleshooting") currently links to the GitHub repo and not to the rendered docs. Might be good to get that updated.
Thanks @tillig for the final pointer. This will be resolved in an upcoming Fleet documentation update. I'm going to shift this item to done.
Hi @tillig - documentation update's published and the troubleshooting now points to the Learn docs instead of the GitHub repo. I'm going to go ahead and close this. We will continue to focus on improving our documentation, so please don't hesitate to flag any issues or provide feedback on it!