pshtt
pshtt copied to clipboard
Logic issue with is_redirect method
The way is_redirect is currently written, a domain will get flagged as a redirect if all endpoints are down or otherwise code 400-ing. For example, imagine domain.tld, which only has its https endpoint up and happens to be returning for a 404 for its index. That will return true
for is_redirect, which is confusing (and happening a bunch where I am).
I'm curious if that whole logic block can be simplified to check and see if any of the end points are returning a code 300. That entire block starting on line 527 might reasonably be reduced to something like:
redirection_codes = range(300, 309)
return (https.status in redirection_codes) or (http.status in redirection_codes) or (httpswww.status in redirection_codes) or (httpwww.status in redirection_codes)
I'll submit a pull request and we can hash it out there.
From #67:
I think we should add a conditional to the calculation that ensures that at least one of the endpoints is an external redirect. Something like:
# ...
and (
httpwww.redirect_eventually_to_external or
http.redirect_eventually_to_external or
httpswww.redirect_eventually_to_external or
https.redirect_eventually_to_external
)
I closed #67, and am moving the proposed resolution here for tracking.
Just noting for myself and others that this is still an issue, and affects at least one second-level domain in exim.gov:
https://s3-us-gov-west-1.amazonaws.com/cg-4adefb86-dadb-4ecf-be3e-f1c7b4f6d084/live/cache/pshtt/exim.gov.json
EDIT: URL is now https://s3-us-gov-west-1.amazonaws.com/cg-4adefb86-dadb-4ecf-be3e-f1c7b4f6d084/live/cache/pshtt/exim.gov.json