gau icon indicating copy to clipboard operation
gau copied to clipboard

Richer JSON output

Open ocervell opened this issue 2 years ago • 5 comments

Would be nice to have some other response data than just the URL in the JSON output, such as :

{ "url": "https://test.domain.synology.me/.htaccess-local", "status_code": 200, "words": 1066, "lines": 100, "content_length": 4516, "content_type": "text/html; charset=utf-8", "duration": 57779116, "host": "test.domain.synology.me" }

That would avoid scraping the endpoint again to find those details.

Maybe even consider using httpx as a client instead of fasthttp as it seems to give more info on the response ?

ocervell avatar Jan 28 '23 20:01 ocervell

gau is completely passive at the moment. It issues no HTTP requests to URLs that are archived from Wayback, OTX, etc. It can be piped into a tool such as httpx for additional info. Would you prefer that gau had an option for this instead?

lc avatar Feb 11 '23 05:02 lc

Ah, I thought since there is a --mc strings # list of status codes to match option that there was still some crawling happening. What is the --mc flag purpose then ? Otherwise an option for adding an httpx query could be done, even though we would not really control httpx input options like tech detection and so on ...

ocervell avatar Apr 03 '23 23:04 ocervell

I think it is useful to add provider, timestamp, status_code, mimetype and content_length to the JSON output. In this case it would be possible to filter by this values on later stages. I checked all providers and all of them return most of this fields. I am ready to implement this change, if you agree.

zerodivisi0n avatar Nov 01 '23 20:11 zerodivisi0n

Hey @zerodivisi0n, I definitely agree

lc avatar Nov 01 '23 20:11 lc

Great! Then I'll do it soon

zerodivisi0n avatar Nov 01 '23 20:11 zerodivisi0n