fix: prevent response file overwrite when -sd flag is used
Summary
This PR addresses issue #2152, which causes response files to be overwritten even when the -sd (SkipDedupe) flag is used.
What was the issue?
When the -sd flag is enabled, input targets are processed without deduplication. However, the responses are still written to the same file path based on the SHA1 hash of the URL, leading to overwritten output files.
What’s changed?
- When SkipDedupe is enabled and a response file with the same name already exists, the response will be written to a new file with a suffix (
_1,_2, ...). - Repeated input targets are now counted using
HybridMap, and the number of processing iterations is determined based on this count. - Modified
countTargetFromRawTargetto return a knownduplicateTargetErrwhen deduplication is disabled. - Refactored response writing logic in
analyzeandprocessto ensure unique file writes under concurrency.
Why is this useful?
This change ensures:
- Accurate tracking and storage of multiple response outputs for identical input targets.
- Prevents unintentional data loss due to file overwrites.
- Honors the intent behind the
-sdflag.
Result
# test.txt
localhost:8000
localhost:9000
localhost:8000
localhost:9000
localhost:8000
localhost:9000
localhost:8000
localhost:9000
$./httpx/httpx -l test.txt -stream -skip-dedupe -sr
$tree output/response/
output/response/
├── index.txt
├── localhost_8000
│ ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
│ ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
│ ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt
│ └── 59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt
└── localhost_9000
├── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt
├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_1.txt
├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_2.txt
└── 981d6875d791d0a1a28393b5ec62d61dff1e977f_3.txt
2 directories, 9 files
$ ./httpx/httpx -l test.txt -skip-dedupe -sr
$ tree output/response/
output/response/
├── index.txt
├── localhost_8000
│ ├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
│ ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
│ ├── 59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt
│ └── 59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt
└── localhost_9000
├── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt
├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_1.txt
├── 981d6875d791d0a1a28393b5ec62d61dff1e977f_2.txt
└── 981d6875d791d0a1a28393b5ec62d61dff1e977f_3.txt
2 directories, 9 files
$./httpx/httpx -l test.txt -sr
output/
└── response
├── index.txt
├── localhost_8000
│ └── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
└── localhost_9000
└── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt
3 directories, 3 files
$ ./httpx/httpx -l test.txt -stream -sr
$ tree output/
output/
└── response
├── index.txt
├── localhost_8000
│ └── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
└── localhost_9000
└── 981d6875d791d0a1a28393b5ec62d61dff1e977f.txt
3 directories, 3 files
Related issue
Closes #2152
Summary by CodeRabbit
-
New Features
- Duplicate targets are now explicitly detected and — when configured — processed multiple times.
- Response saving uses exclusive creation and auto-increments filenames on collisions to ensure uniqueness; screenshot saving unchanged.
-
Bug Fixes
- Improved duplicate-target counting and propagation to ensure accurate processing.
- Removed on-the-fly response saving from the primary output path to reduce conflicts.
-
Refactor
- Per-target processing reworked to honor per-target counts and stream mode.
-
Tests
- Updated tests to verify duplicate detection and handling.
Walkthrough
Adds explicit duplicate detection and counting for input targets, propagates duplicates when deduplication is disabled, refactors per-target processing into a repeatable runner that executes per-target count times, changes response-saving to exclusive-create with incremental suffixes to avoid overwrites, and updates tests to assert the new duplicate error behavior.
Changes
| Cohort / File(s) | Change Summary |
|---|---|
Duplicate detection & countingrunner/runner.go |
Introduced duplicateTargetErr; countTargetFromRawTarget returns (0, duplicateTargetErr) for existing targets; prepareInput and loadAndCloseFile now increment stored counts and total targets when Options.SkipDedupe is set. |
Per-target processing flowrunner/runner.go |
Refactored RunEnumeration to use a runProcess(times) helper; non-stream targets are executed according to stored per-target counts; stream mode runs once. |
Response file persistence & namingrunner/runner.go |
Moved response-saving out of the main overwrite path; response files are created with exclusive-create (O_EXCL) and, on name collisions, retried with incrementing suffixes (_1, _2, ...) before .txt to guarantee unique filenames. |
Tests & error handlingrunner/runner_test.go |
Added github.com/pkg/errors import; test initializes Options.SkipDedupe appropriately; duplicate assertions updated to errors.Is(err, duplicateTargetErr). |
Sequence Diagram(s)
sequenceDiagram
participant User
participant Runner
participant FS as FileSystem
Note over Runner: Input parsing & counting
User->>Runner: supply targets (may include duplicates) + options
Runner->>Runner: countTargetFromRawTarget(raw)
alt target exists
Runner-->>Runner: returns duplicateTargetErr
alt SkipDedupe enabled
Runner->>Runner: increment stored count for target
else
Runner->>Runner: ignore duplicate (no count increase)
end
else new target
Runner->>Runner: store target with count=1
end
Note over Runner,FS: Processing phase (per-target repeats)
loop i = 1..count
Runner->>Runner: runProcess(1) — perform enumeration for target
Runner->>FS: attempt create response file (O_EXCL)
alt creation collision
FS->>FS: compute filename with suffix (_1, _2, ...)
FS->>Runner: return newly created file
else success
FS->>Runner: file created
end
end
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
I nibble duplicates, one by one,
Count each hop till counting's done.
Files get tails when names collide,
No more overwrites — bounce with pride.
A rabbit cheers for safer strides 🐇✨
Pre-merge checks and finishing touches
✅ Passed checks (5 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title Check | ✅ Passed | The PR title "fix: prevent response file overwrite when -sd flag is used" directly and concisely summarizes the main objective of the pull request. The title clearly identifies the problem being addressed (preventing response file overwrites) and the specific context (when the -sd flag, which enables SkipDedupe, is used). This matches the core fix described in the PR objectives and directly relates to the changes in runner.go that implement duplicate target counting and unique filename generation to prevent overwrites. The title is specific, clear, and avoids vague terminology. |
| Linked Issues Check | ✅ Passed | The code changes directly implement the requirements from linked issue #2152, which asks that repeated input targets with the -sd (SkipDedupe) flag should produce multiple response files instead of overwriting a single file. The implementation includes a sentinel duplicateTargetErr to detect duplicate targets when SkipDedupe is enabled, counting of repeated targets via a map structure, refactored RunEnumeration logic to process targets multiple times based on their count, and unique filename generation with incrementing suffixes (_1, _2, etc.) to prevent file collisions. These changes collectively ensure that when -sd is used, each processed instance of a repeated input target produces its own response file with a unique name, matching the expected behavior shown in issue #2152. |
| Out of Scope Changes Check | ✅ Passed | The code changes appear focused on the scope of issue #2152. While the PR includes refactoring of response-writing logic and removal of on-the-fly HTTP response saving from the primary output path, these changes are explicitly mentioned in the PR objectives as necessary structural changes to ensure thread-safe unique file writing under concurrency. This refactoring appears intentional and directly supports the fix by moving response-writing to a centralized location where duplicate handling and unique naming logic can be properly applied. The PR objectives confirm this removal is part of the broader refactoring to prevent overwrites, and note that screenshot saving behavior is preserved, indicating the changes are deliberate and scoped to solving the core issue. |
| Docstring Coverage | ✅ Passed | No functions found in the changes. Docstring coverage check skipped. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
📜 Recent review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between bd4dc827b252e4819a6c7ac456fee90be68ad1ab and a3212c6c0c1fb7967704445eb794804a6f8dda13.
📒 Files selected for processing (1)
runner/runner.go(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Functional Test (macOS-latest)
- GitHub Check: Functional Test (windows-latest)
- GitHub Check: Functional Test (ubuntu-latest)
- GitHub Check: Analyze (go)
- GitHub Check: Lint Test
🔇 Additional comments (4)
runner/runner.go (4)
480-480: LGTM: Sentinel error for duplicate detection.The sentinel error is well-named and follows standard Go error handling patterns.
674-677: LGTM: Explicit duplicate detection.Returning
duplicateTargetErrwhen a target already exists in the map enables callers to distinguish duplicates from other errors, which is essential for the SkipDedupe feature.
2235-2258: LGTM: Exclusive file creation with unique suffixes.The exclusive creation logic using
os.O_EXCLcorrectly prevents race conditions and overwrites. The incrementing suffix approach ensures each duplicate target produces a unique response file, which is the core requirement of this PR.The infinite loop is acceptable here since:
- Non-exist errors break immediately (line 2255)
- The number of duplicates is finite and controlled by user input
- The atomic
O_EXCLoperation makes this concurrency-safe
1316-1338: No issues found. The original review comment is incorrect.The code is safe from the concern raised. In non-stream mode (line 1349),
r.hm.Scan()iterates only over keys that exist in the map, so ther.hm.Get(k)lookup at line 1332 is guaranteed to succeed. In stream mode, the map is not consulted—runProcess(1)runs directly. The suggested defensive fallback is unnecessary and would introduce unreachable code.Likely an incorrect or invalid review comment.
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Should the entries in the index file match the files under the output/response/ directory?
Currently, it looks like a new index file is created for every request. As a result, when requests are repeated, the index file and the files inside the response/ folder become inconsistent.
Round 1
output/response/
├── index.txt
└── localhost_8000
├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
└── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
#index.txt
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b.txt http://localhost:8000 (200 OK)
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt http://localhost:8000 (200 OK)
Round 2
output/response/
├── index.txt
└── localhost_8000
├── 59bd7616010ed02cd66f44e94e9368776966fe3b.txt
├── 59bd7616010ed02cd66f44e94e9368776966fe3b_1.txt
├── 59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt
└── 59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt
#index.txt
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b_2.txt http://localhost:8000 (200 OK)
/discovery/httpx/output/response/localhost_8000/59bd7616010ed02cd66f44e94e9368776966fe3b_3.txt http://localhost:8000 (200 OK)