longhorn-manager icon indicating copy to clipboard operation
longhorn-manager copied to clipboard

improve logging and status for volume scheduling

Open philippedev101 opened this issue 5 months ago • 4 comments
trafficstars

Which issue(s) this PR fixes:

https://github.com/longhorn/longhorn/issues/10999

What this PR does / why we need it:

Improve the clarity of log messages and volume status conditions related to replica scheduling.

Volume Controller:

  • Refined reconcileVolumeCondition to produce more precise Scheduled condition messages, avoiding redundancy when replicas are unscheduled.
  • Detailed scheduler errors are now primarily used for PV annotations and robust Faulted state determination.
  • Enhanced logs for scheduling attempts (distinguishing retries from persistent failures) and for dropping volumes from the queue.

Replica Scheduler:

  • Modified scheduling functions to return util.MultiError, aggregating all reasons for scheduling failures for better upstream reporting.
  • Log messages now include more specific context (volume, replica, node, disk) and detailed reasons for placement decisions.

Testing:

  • Updated test expectations for VolumeStatus to match new controller behavior.
  • Added new scheduler tests to verify detailed error reporting.

Special notes for your reviewer:

Have a nice day!

Additional documentation or context

not at the moment

philippedev101 avatar May 30 '25 18:05 philippedev101