[SURE-9488] Fleet UI errors will simplify troubleshooting and reduce escalations
SURE-9488
Request Description: Descriptive Error Messages in the Fleet UI
Expected behaviour: UI should provide more descriptive error messages where you can pinpoint where the issue is since Hosted Rancher logs are not accessible. Or allow viewing the Rancher logs in the UI
Actual behaviour: The message show like 'failed: 3/1time="2024-11-28T09:04:55Z" level=fatal msg="context canceled"' where it doesn't provide useful context where to check the issue.
Actual issue: fleet.yaml contains duplicate key entry and validation fails or sometimes the helm chart doesn't exist.
TODO
Let's check all the error conditions and add a "context", like "error in gitjob: ".
- [ ] Maybe strip or format the timestamp, too so they look better.
- [ ] Replace all "context canceled" error messages, because users don't understand "context canceled". This probably translates into "timeout waiting for gitjob to complete"
- [ ] Revisit
FailureandReadinessConditions (see SURE-9488)
Let's check all the error conditions and add a "context", like "error in gitjob: ".
Maybe strip or format the timestamp, too.
Replace context canceled with timeout?
This probably translates into "timeout waiting for gitjob to complete"
- [ ] Failure and Readiness Conditions (see SURE-9488)
I closed JIRA today since we've made good progress in 0.12.0
/backport v2.11.2
/backport v2.10.6
System Information
Before Upgrade:
| Rancher Version | Fleet Version |
|---|---|
| v2.11.0 | 106.0.0+up0.12.0 |
Steps followed
- Created a GitRepo using this fleet.yaml
- Wait for the Job to be created.
- GitRepo is in Error state (see below screenshot for more details)
Job Failed. failed: 1/1time="2025-05-19T18:27:31Z" level=fatal msg="failed to process bundle: context canceled"
Screenshot showing Error message.
After Upgrade
| Rancher Version | Fleet Version |
|---|---|
| v2.12-49289cc9c6590b361d64950977dd20b1214908d7-head | 107.0.0+up0.13.0-alpha.3 |
- After upgrade, error message is clearly stating what exactly cause of failure.
- See screenshot for exact error message.
Failed to process bundle: failed reading resources for "rke-monitoring/app": loading directory .chart/2ace3fcaa23682ab77cf7bdcd5a6df94dbc1d2e2a3a25bd63a1d4b82a0fde0d1, rke-monitoring/app: helm chart download: failed to do request: Head "https://registry01.suse/v2/helm/rancher-monitoring/manifests/106.0.1_up66.7.1-rancher.10": dial tcp: lookup registry01.suse on 10.43.0.10:53: no such host