pshtt icon indicating copy to clipboard operation
pshtt copied to clipboard

Logic issue with is_redirect method

Open egyptiankarim opened this issue 7 years ago • 2 comments

The way is_redirect is currently written, a domain will get flagged as a redirect if all endpoints are down or otherwise code 400-ing. For example, imagine domain.tld, which only has its https endpoint up and happens to be returning for a 404 for its index. That will return true for is_redirect, which is confusing (and happening a bunch where I am).

I'm curious if that whole logic block can be simplified to check and see if any of the end points are returning a code 300. That entire block starting on line 527 might reasonably be reduced to something like:

redirection_codes = range(300, 309)

return (https.status in redirection_codes) or (http.status in redirection_codes) or (httpswww.status in redirection_codes) or (httpwww.status in redirection_codes)

I'll submit a pull request and we can hash it out there.

egyptiankarim avatar Mar 30 '17 23:03 egyptiankarim

From #67:

I think we should add a conditional to the calculation that ensures that at least one of the endpoints is an external redirect. Something like:

# ...
  and (
    httpwww.redirect_eventually_to_external or 
    http.redirect_eventually_to_external or 
    httpswww.redirect_eventually_to_external or 
    https.redirect_eventually_to_external
  )

I closed #67, and am moving the proposed resolution here for tracking.

konklone avatar May 15 '17 14:05 konklone

Just noting for myself and others that this is still an issue, and affects at least one second-level domain in exim.gov:

https://s3-us-gov-west-1.amazonaws.com/cg-4adefb86-dadb-4ecf-be3e-f1c7b4f6d084/live/cache/pshtt/exim.gov.json

EDIT: URL is now https://s3-us-gov-west-1.amazonaws.com/cg-4adefb86-dadb-4ecf-be3e-f1c7b4f6d084/live/cache/pshtt/exim.gov.json

konklone avatar Jun 05 '17 04:06 konklone