dagu
dagu copied to clipboard
solving #970 and updating the doc regarding wrong mention of exitCode
Fix: Improve Log File Handling During Retries
Problem
When a DAG step is retried, the log files were not being properly handled, leading to "file already closed" errors. This occurred because:
- The same log file was being reused across retries
- File handles were being closed after the first attempt
- Subsequent retries couldn't write to the closed file
Solution
Modified the log file handling in internal/digraph/scheduler/node.go to:
- Create unique log files for each retry attempt
- Append retry count to log filenames (e.g.,
.retry1,.retry2) - Ensure proper file handle management during retries
Changes
- Updated
Setupmethod to include retry count in log filenames - Added retry suffix format:
.retryNwhere N is the retry count - Maintained backward compatibility for non-retry cases
Testing
- Verified that retries now create separate log files
- Confirmed no "file already closed" errors during retries
- Tested with multiple retry attempts
- Ensured backward compatibility for non-retry cases
Impact
- Improved reliability of retry mechanism
- Better debugging capabilities with separate logs for each retry
- Clearer history of retry attempts
- No changes to existing functionality for non-retry cases
Related Issues
Closes #[970]
Retry should be happening only after last retry is complete. I am not sure if it is a race condition. If each retry step is closing log file, we should try to reopen it in append mode if it exists (in subsequent retries) or create it (in first retry). @yottahmd What was original behavior before this bug got in?
You can also attempt this solution:
if n.GetRetryCount() > 0 and exitCode is in the exitCodesList { //don't close the log file at following line: https://github.com/dagu-org/dagu/blob/9c2c1633a935133dd172fcacc8696fcca862b53f/internal/digraph/scheduler/node.go#L441 }
We need to pass on the exitCode returned from last retry to tearDown function. And this bug exists in 1.16.x
Hi @ghansham, @kriyanshii — thanks for investigating this! 🙏
I think @ghansham is right. It seems like a bug that the log file gets closed on each retry. We should ensure that it stays open and doesn’t get closed with each retry (It was the original behavior).
I think we can just remove the code below. @kriyanshii Can you please update the PR? https://github.com/dagu-org/dagu/blob/4dfbec53d0093333bce71cb3033a3541ef15dda5/internal/digraph/scheduler/scheduler.go#L252-L255
Very cool
@yottahmd I have updated the PR. Please review.
Codecov Report
Attention: Patch coverage is 78.57143% with 3 lines in your changes missing coverage. Please review.
Project coverage is 61.58%. Comparing base (
73f500c) to head (f422f69).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| internal/digraph/scheduler/node.go | 78.57% | 2 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #971 +/- ##
==========================================
- Coverage 61.59% 61.58% -0.01%
==========================================
Files 85 85
Lines 10854 10861 +7
==========================================
+ Hits 6685 6689 +4
- Misses 3485 3487 +2
- Partials 684 685 +1
| Files with missing lines | Coverage Δ | |
|---|---|---|
| internal/digraph/scheduler/scheduler.go | 76.64% <ø> (-0.14%) |
:arrow_down: |
| internal/digraph/scheduler/node.go | 64.64% <78.57%> (+0.09%) |
:arrow_up: |
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 73f500c...f422f69. Read the comment docs.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
It's better to keep a single log file but at the start of every retry we should print some message mentioning retry no. X (Start Time: ......)
Apologies for leaving the PR open. Thank you for all your hard work on this issue —I really appreciate it, even though it hasn’t been merged. If any other issues catch your eye, please feel free to pick them up!