longhorn-manager
longhorn-manager copied to clipboard
improve logging and status for volume scheduling
trafficstars
Which issue(s) this PR fixes:
https://github.com/longhorn/longhorn/issues/10999
What this PR does / why we need it:
Improve the clarity of log messages and volume status conditions related to replica scheduling.
Volume Controller:
- Refined
reconcileVolumeConditionto produce more preciseScheduledcondition messages, avoiding redundancy when replicas are unscheduled. - Detailed scheduler errors are now primarily used for PV annotations and robust
Faultedstate determination. - Enhanced logs for scheduling attempts (distinguishing retries from persistent failures) and for dropping volumes from the queue.
Replica Scheduler:
- Modified scheduling functions to return
util.MultiError, aggregating all reasons for scheduling failures for better upstream reporting. - Log messages now include more specific context (volume, replica, node, disk) and detailed reasons for placement decisions.
Testing:
- Updated test expectations for
VolumeStatusto match new controller behavior. - Added new scheduler tests to verify detailed error reporting.
Special notes for your reviewer:
Have a nice day!
Additional documentation or context
not at the moment