Option to save save redirection value instead of request
I used python3 photon.py --url http://x.x.x.x --level 1 --only-url and I got a list of 103 internal URL.
All the URL are using the following pattern: http://x.x.x.x/?r=[redirection_token].
Having this list alone is pretty useless, what is interesting is to get the redirection value (for example contained in the Location header after a HTTP 302 or 303 code).
There should be an option to store the redirection value instead of the raw URL when a redirection HTTP code is hit.
This could be implemented with something like in pseudo-code:
check_http_code_status(code):
switch(code):
case 200:
store(request)
case 301, 302, 303:
store(answer.location):
case 404:
do_nothing
Hi @noraj ,
Thanks for reporting the issue, can you please check if this PR fixes it?
Photon should now store the redirecting URLs in redirects.txt in the following format:
https://example.com/redirect_from==>https://example.com/redirect_to
@noraj ???
@s0md3v Yeah answering, I'm just writing long post and I need to check what I say before affirming it.
I git cloned a fresh copy then git checkout redirect, then ran python photon.py --url http://x.X.x.x/ --level 1 --only-url but I have the exact same result as before without https://example.com/redirect_from==>https://example.com/redirect_to.
I think this is because when http://x.X.x.x/ is hit the code is 200 and there is --level 1 so other links are scrapped but not requested no we never go in the if code[0] == '3': statement.
https://github.com/s0md3v/Photon/blob/0a5de25964538e16486064bdb2049c39e2de4343/photon.py#L219-L222
So we are forced to use python photon.py --url http://x.X.x.x/ --level 2 --only-url but here instead of having the 103 internal URL from the root page I have more than 700 URLs from all the sub-pages and it took way more time to scan (103 remote pages instead of just one).
That is why I talked about a redirect switch option that will allow internal URL collected to be requested to see if they answer a page or a redirection, and then if it is a redirection.
So what I mean is keep the actual behavior + add a new option --whatevername that will treat internal URL scrapped as potential redirection and so request them to store the potential redirection value in addition of the raw internal URL.
Also I got about 30 (using level 2) URL in failed.txt but all are valid, example:
$ curl -vvv http://x.x.x.x/\?s\=_____ba8da76e357a______
* Trying x.x.x.x...
* TCP_NODELAY set
* Connected to x.x.x.x (x.x.x.x) port 80 (#0)
> GET /?s=_____ba8da76e357a______ HTTP/1.1
> Host: x.x.x.x
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 303 See Other
< Date: Tue, 23 Oct 2018 18:47:37 GMT
< Server: localhost
< Content-Type: text/html
< Location: https://googleprojectzero.blogspot.com/xxxxxxxxxxx.html
< Content-Length: 0
<
* Connection #0 to host x.x.x.x left intact
So I don't know why they are failed.
But even with level no redirection value are stored, I even checked with grep -ri '==>' ./.
PS : maybe check that python requests lib handle 303 redirect.
Hi @noraj ,
It is to let you know that the issue has been acknowledged and I am working on it.
I will add a new switch, --verify which will solve redirection and 404 issues by verifying all the URLs added on each level before crawling further.
Thanks for the verbose explanation of the issue, it really helped.
PS: Would it be possible for you to provide the website you are testing against? You can dm at twitter
I guess adding a parameter allow_redirects=False to L239, and doing a relevant check will fix this.
@0xInfection We want to follow redirects.
Don't worry guys, I will fix it once I have free time.