httpx icon indicating copy to clipboard operation
httpx copied to clipboard

disable dedupe in response file write when `-sd` is used

Open ehsandeep opened this issue 8 months ago • 1 comments

httpx can avoid overwriting response file content for the same input when -skip-dedupe is used, as the user has explicitly disabled deduplication and wants to see each copy of the response.

Input:

$ cat test.txt
example.com
example.com

httpx run

 $ httpx -l test.txt -stream -skip-dedupe -sr -silent
https://example.com
https://example.com

Current behavior:

$ tree output/response
output/response
├── example.com
│   └── cea8d4cbc5e3b39fcbcf053e0e0244fe14c835ae.txt
└── index.txt

1 directory, 2 files

Expected behavior:

$ tree output/response
output/response
 ├── example.com
 │ └── cea8d4cbc5e3b39fcbcf053e0e0244fe14c835ae.txt
+├── example.com
+│ └── cea8d4cbc5e3b39fcbcf053e0e0244fe14c835ae.txt
 └── index.txt

1 directory, 3 files

ehsandeep avatar Apr 19 '25 10:04 ehsandeep

Question: how can you have two example.com directories (same name)? Don't we need something to differentiate?

loresuso avatar May 12 '25 08:05 loresuso

Hi team 👋

I'm looking into contributing a fix for a case where response files get overwritten when scanning the same domain multiple times.

the response path is determined using:

domainFile := resp.Method + ":" + URL.EscapedString()
hash := hashes.Sha1([]byte(domainFile))
domainResponseFile := fmt.Sprintf("%s.txt", hash)
responseBaseDir := filepath.Join(..., hostFilename)
responsePath := filepath.Join(responseBaseDir, domainResponseFile)

This results in the same response file (/.txt) being used when the same domain is scanned multiple times — even with -skip-dedupe enabled. In my case, repeated requests to http://localhost:8000 result in the same file (59bd76....fe3b.txt) being overwritten, despite multiple entries test.txt.

Would it make sense to append an incrementing suffix like:

localhost_8000/59bd76...fe3b_1.txt  
localhost_8000/59bd76...fe3b_2.txt

to avoid overwriting and allow storing multiple responses for the same domain and path?

Or is there a preferred way to handle this use case?

Thanks!

jjhwan-h avatar Jul 29 '25 16:07 jjhwan-h

incrementing suffix like

this sounds like good idea @jjhwan-h

ehsandeep avatar Jul 29 '25 18:07 ehsandeep

Hi, I have a quick question while reviewing this issue.

I noticed that in both runner.go:1097 and runner.go:2154, the same responsePath appears to be written with the same data.

(Specifically, RunEnumeration calls process, which in turn calls analyze, where the write operations occur.)

Is there a particular reason for writing the same response to disk twice, or might this be an unintentional redundancy? @ehsandeep

Image

jjhwan-h avatar Jul 30 '25 17:07 jjhwan-h

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions!

github-actions[bot] avatar Nov 02 '25 00:11 github-actions[bot]

This issue has been automatically closed due to inactivity. If you think this is a mistake or would like to continue the discussion, please comment or feel free to reopen it.

github-actions[bot] avatar Nov 09 '25 00:11 github-actions[bot]