pshtt icon indicating copy to clipboard operation
pshtt copied to clipboard

Consider factoring in meta refresh tags when calculating redirects

Open konklone opened this issue 7 years ago • 2 comments

Not necessarily for relaxing compliance standards around using server-side 80->443 redirects, but just to detect a broader swathe of agency behavior.

For example, segurosocial.gov seems to redirect to socialsecurity.gov, but it actually uses a <meta> tag to do the refresh. And further, it redirects to an insecure URL:

curl https://segurosocial.gov
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>SEGUROSOCIAL</TITLE>
<META content="text/html; charset=windows-1252" http-equiv=Content-Type>
<META content="MSHTML 5.00.2314.1000" name=GENERATOR>
<META HTTP-EQUIV="refresh" CONTENT="0; URL=http://www.socialsecurity.gov/espanol">
</HEAD>
<BODY aLink=#ff0000 bgColor=#ffffff link=#000ff text=#000000 vLink=#0000ff>
</BODY></HTML>

However, this doesn't show up in pshtt at all, so there's no way to detect this kind of thing.

It'd be a new thing to look at (and parse) HTML content instead of just HTTP headers and status codes, but if it's simple enough, it may be worth it, and offering a new field or set of fields (separate from the fields there now for server redirects) for downstream tools who care about them.

konklone avatar Feb 07 '17 01:02 konklone

However, this doesn't show up in pshtt at all, so there's no way to detect this kind of thing.

True! There are also other redirect techniques beyond meta redirects that pshtt currently can't recognize: for example, https://abcnews.go.com uses Javascript to downgrade HTTPS:

<script>
        if (window.location.protocol == "https:" && window.parent.location.hostname.indexOf("outbrain") == -1) {
                var _sslurl = window.location.href.replace("https://", "http://");
                window.location.replace(_sslurl);
                window.location.href = _sslurl;
        }
</script>

I think the most comprehensive approach would be to use browser automation - "it's the only way to be sure." On the other hand, while that would make it easy to determine whether a site downgrades HTTPS or not, it wouldn't automatically help with the harder problem of determining why/how a site downgrades.

If you want to keep this issue specifically about meta redirects, let me know, and I'll move this comment to a dedicated issue about detecting JS redirects.

garrettr avatar Feb 07 '17 08:02 garrettr

The main reason I was considering meta redirects as possible is because in theory we should already have the HTML content from our requests to the site, and no more network activity is necessary. We'd only need to run an HTML parse operation on the retrieved content.

To do JS redirect detection would require (as you say) a headless browser, and potentially more network requests if the relevant JS is brought in via an external file and not an inline script. While HTML parsing isn't trivial, operating a headless browser and making arbitrary additional network requests is less appealing to me.

No worries on discussing it all in this issue, IMO.

konklone avatar Feb 08 '17 00:02 konklone