[Feature]: Better logging when a run terminates due to max_duration
Problem
In the server console logs, it's unclear that a run was terminated due to max_duration. Attaching image showing when a run was started and the messaging shown 6 hours later (default max_duration).
Solution
No response
Workaround
No response
Would you like to help us implement this feature by sending a PR?
No
@james-boydell, I agree we should improve the run failure reason in that case. Still, it's recommended to check run diagnostic logs on fails. They are available to users that don't have access to server logs and may contain more information than server logs:
Do dstack logs --diagnose run_name and you'd see:
...
time=2024-09-19T04:32:24.589936-04:00 level=error msg=Max duration exceeded max_duration=180
time=2024-09-19T04:32:24.590001-04:00 level=info msg=Job state changed new=terminated
This issue is stale because it has been open for 30 days with no activity.
This issue is stale because it has been open for 30 days with no activity.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.