cli icon indicating copy to clipboard operation
cli copied to clipboard

Host header output incorrect when explicitly specifying the default port

Open thetuxkeeper opened this issue 4 years ago • 3 comments

Hi,

if you do something like http --print hH http://localhost:80/ you get this with :80 in the host header as output:

GET / HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: localhost:80
User-Agent: HTTPie/2.4.0

But if you look at it with strace (strace -f -v -s 256 -e sendto http --print hH http://localhost:80/) you get

sendto(3, "GET / HTTP/1.1\r\nHost: localhost\r\nUser-Agent: HTTPie/2.4.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n", 130, 0, NULL, 0) = 130

So the port was removed in the actual request. What httpie is showing is wrong in this case. It works with non-default ports, so it's just an issue with http + 80 and https + 443.

It confused me when I debugged a custom host header matching logic where I forgot to match/ignore the port.

It should both match, so there's no confusion what really is sent.

Workaround: If you want to get the port to the host header, you have to use the header request item argument.

thetuxkeeper avatar Feb 16 '21 16:02 thetuxkeeper

Thanks for the report, @thetuxkeeper! The reported vs. actual Host value inconsistency is a bug.

The default protocol port normalization is quite common. But we should document it together with the possible explicit Host overwrite. (Just need to make sure the overwrite works with https:// URLs as well.).

Relevant spec:

A "host" without any trailing port information implies the default port for the service requested (e.g., "80" for an HTTP URL). — https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23

jkbrzt avatar Feb 16 '21 17:02 jkbrzt

@jakubroztocil : Thanks for the fast response! Yes, the removal of the default port is usually expected. Just the inconsistency bug was irritating. I was debugging "strange" requests with the default port in the Host header which seems to be quite common when using proxies or something like that (at least it's nothing too uncommon). I couldn't find the bug until a colleague was testing and reproducing it with curl ...

thetuxkeeper avatar Feb 17 '21 14:02 thetuxkeeper

A simple fix can be applied, I think. to httpie/cli/argparser.py

UPDATE: It breaks on IPv6. I haven't checked that out yet.

❯ git diff --cached argparser.py
diff --git a/httpie/cli/argparser.py b/httpie/cli/argparser.py
index 720e70b..a8963f9 100644
--- a/httpie/cli/argparser.py
+++ b/httpie/cli/argparser.py
@@ -6,6 +6,7 @@ import sys
 from argparse import RawDescriptionHelpFormatter
 from textwrap import dedent
 from urllib.parse import urlsplit
+from urllib.parse import urlparse

 from requests.utils import get_netrc_auth

@@ -133,6 +134,14 @@ class HTTPieArgumentParser(argparse.ArgumentParser):
             else:
                 self.args.url = scheme + self.args.url

+        urlscheme = urlparse(self.args.url).scheme
+        urlport = urlparse(self.args.url).port
+
+        if urlscheme == 'https' and urlport == 443 \
+                or urlscheme == 'http' and urlport == 80:
+            self.args.url = self.args.url.replace( ":" + str(urlport), '')
+
+
     # noinspection PyShadowingBuiltins
     def _print_message(self, message, file=None):
         # Sneak in our stderr/stdout.

msoe avatar Apr 08 '21 08:04 msoe