gospider
gospider copied to clipboard
Avoid including wrong results caused by html-redirect
Try to use gospider on a site like https://vivy.com. You'll realize that due to the application "redirect" implementation all the urls are added to the output file just because the server return a 200 code.
Issue: gospider retrieves false positive urls for applications that implement the redirect not using http codes.
To replicate:
Go to https://vivy.com/made_up_url
and notice that the application returns a 200 code even though the content is an html page the redirects to the home page
<html>
<head>
<meta http-equiv="refresh" content="0; url=https://www.vivy.com/" />
</head>
<body></body>
</html>
Possible way to solve this: Add an option to specify a regex that, if matched in the response automatically discards the url or marks it in the output file.