[HTTP1.1] `Host` header sent twice if also specified in request headers
While attempting to use async-http (0.71.0) with against a reverse proxy where we need to specify a Host header, the library sends the header twice: one with the URL's hostname value, and one with the value specified in the request. For most web servers, this results in a 400 Bad Request.
url = URI.parse("https://my.reverse.proxy.com/path/to/resource")
headers = {
"Host" => "api.example.com"
}
body = JSON.dump({ foo: "bar" })
Sync do
Async::HTTP::Internet.post(url, headers, body) do |resp|
puts resp.inspect
end
end
It looks like this is coming from protocol-http1 where the host header is always written, irrespective if it also exists in headers. https://github.com/socketry/protocol-http1/blob/main/lib/protocol/http1/connection.rb#L133-L138
The reverse proxy use case is one, but it could also be for any situation where the TCP/TLS socket connection should be made to a different authority than what the HTTP Host header dictates. For example, this should also be completely valid:
url = URI.parse("http://localhost:9292/path/to/resource")
headers = {
"Host" => "example.multi.tenant.app.com" # used to lookup SaaS tenant, etc
}
body = JSON.dump({ foo: "bar" })
Sync do
Async::HTTP::Internet.post(url, headers, body) do |resp|
puts resp.inspect
end
end
What you are trying to do is possible, but it's probably not achievable with Async::HTTP::Internet which is a canonical/opinionated interface for accessing internet resources following standard conventions.
More specifically, the host header is not part of HTTP/2+, and therefore the library has a specific field for dealing with host identification, referred to as authority. In HTTP/1, authority maps directly to the host header, and in HTTP/2+ it maps to the :authority pseudo header.
Note that this can be different from the address you connect to (in your case the reverse proxy), and even different from the hostname used for TLS negotiation (can be used to select certificates for validating the connection is secure).
According to your initial problem, something like this will work for you:
server_endpoint = ::IO::Endpoint.tcp("my.reverse.proxy.com", 443)
host_endpoint = ::Async::HTTP::Endpoint.parse("https://api.example.com/path/to/resource", server_endpoint)
client = ::Async::HTTP::Client.new(host_endpoint)
client.post(...)
This will configure the authority and TLS as if it was talking to api.example.com.
Correct me if I am wrong, but creating a client in this way will prevent connection reuse/persistent connections since we will need a Client instance per combination of TCP endpoint and host/authority. In reality, we want to reuse the connection to the TCP endpoint, but just change the Host header per request.
It will depend on your TLS setup, if you are expecting SNI (which is at the connection level, usually for virtual hosts) to work, you will need the setup I suggested, and yes, you will want to cache the client instances and yes, you are correct, it will be one connection at least per SNI hostname. If you don't need SNI, and only need to change the authority, you can do this instead:
def get(client, url, headers = Protocol::HTTP::Headers.new)
client.call(
Protocol::HTTP::Request.new(nil, url.hostname, "GET", url.path, headers, body)
)
end
The 2nd argument to the Request.new is the authority (host) to use.
If this was more commonly requested, we could consider adding a keyword argument to the method helpers, e.g. internet.get(..., authority: ...) - however I don't know if your issue encompasses SNI or not - can you give me more details about that part of your setup?
We are essentially trying to do the equivalent of the following Go code with async-http, where we want to keep a pool of persistent connections to the reverse proxy.
The reverse proxy only has a TLS certificate for my.tld.com but has millions of virtual hosts.
We use Cloudflare SaaS where they connect to the reverse proxy/loadbalancer with TLS/SNI set to my.tld.com and the Host/:authority header set to the tenant's domain.
https://developers.cloudflare.com/cloudflare-for-platforms/cloudflare-for-saas/reference/connection-details/
Show Code
package main
import (
"crypto/tls"
"fmt"
"io"
"net"
"net/http"
"net/http/httptrace"
"time"
)
func main() {
// Create a custom dialer
dialer := &net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
}
// Create a custom TLS config
tlsConfig := &tls.Config{
ServerName: "nuc.adi.run",
}
// Create a custom transport
transport := &http.Transport{
DialContext: dialer.DialContext,
TLSClientConfig: tlsConfig,
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
}
// for trace events
clientTrace := &httptrace.ClientTrace{
GetConn: func(hostPort string) { fmt.Println("starting to create conn ", hostPort) },
GotConn: func(info httptrace.GotConnInfo) { fmt.Printf("connection established %+v\n", info) },
TLSHandshakeDone: func(info tls.ConnectionState, err error) {
fmt.Printf("TLS handshake done %+v\n", info)
},
WroteHeaderField: func(key string, value []string) {
if key == ":authority" {
fmt.Println("Wrote header field ", key, value)
}
},
ConnectStart: func(network, addr string) { fmt.Println("starting tcp connection", network, addr) },
ConnectDone: func(network, addr string, err error) { fmt.Println("tcp connection created", network, addr, err) },
}
// Create a client with the custom transport
client := &http.Client{
Transport: transport,
}
// Create a new request
req, err := http.NewRequest("GET", "https://nuc.adi.run", nil)
if err != nil {
fmt.Println("Error creating request:", err)
return
}
doRequest(client, clientTrace, req, "code.aditya.me")
fmt.Println()
req, err = http.NewRequest("GET", "https://nuc.adi.run", nil)
if err != nil {
fmt.Println("Error creating request:", err)
return
}
doRequest(client, clientTrace, req, "jellyfin.aditya.me")
}
func doRequest(client *http.Client, clientTrace *httptrace.ClientTrace, req *http.Request, overrideHostName string) {
clientTraceCtx := httptrace.WithClientTrace(req.Context(), clientTrace)
req = req.WithContext(clientTraceCtx)
// override Host header
req.Host = overrideHostName
resp, err := client.Do(req)
if err != nil {
fmt.Println("Error performing request:", err)
return
}
defer resp.Body.Close()
io.Copy(io.Discard, resp.Body)
fmt.Println("Response status:", resp.Status)
}
go run main.go
starting to create conn nuc.adi.run:443
starting tcp connection tcp 10.0.0.2:443
tcp connection created tcp 10.0.0.2:443 <nil>
TLS handshake done {Version:772 HandshakeComplete:true DidResume:false CipherSuite:4865 NegotiatedProtocol:h2 NegotiatedProtocolIsMutual:true ServerName:nuc.adi.run PeerCertificates:[0x140001d8008 0x140001d8588] VerifiedChains:[[0x140001d8b08 0x140001d9088 0x140001d9608]] SignedCertificateTimestamps:[] OCSPResponse:[] TLSUnique:[] ECHAccepted:false ekm:0x104d76c80 testingOnlyDidHRR:false testingOnlyCurveID:29}
connection established {Conn:0x14000192008 Reused:false WasIdle:false IdleTime:0s}
Wrote header field :authority [code.aditya.me]
starting to create conn nuc.adi.run:443
connection established {Conn:0x14000192008 Reused:true WasIdle:true IdleTime:53.75µs}
Wrote header field :authority [code.aditya.me]
Response status: 200 OK
starting to create conn nuc.adi.run:443
connection established {Conn:0x14000192008 Reused:true WasIdle:true IdleTime:82.5µs}
Wrote header field :authority [jellyfin.aditya.me]
starting to create conn nuc.adi.run:443
connection established {Conn:0x14000192008 Reused:true WasIdle:true IdleTime:30.375µs}
Wrote header field :authority [jellyfin.aditya.me]
Response status: 200 OK
Okay, I understand, SNI is not an issue for you.
Similar request has come up in the past, so I've introduced support for keyword arguments to Async::HTTP::Internet, you can specify authority: in your case to set the host header.
See https://socketry.github.io/async-http/releases/index.html#async::http::internet-accepts-keyword-arguments for more details.
Thanks @ioquatix, this unblocks us for now. As an aside, do you think it would also be beneficial to add a sanity check or other defensive measure to ensure the host header is not sent twice? It was not immediately intuitive when we received 400s on the reason why, and we were only able to figure it out because we had access to nginx logs.
Unfortunately, it appears that this change does not play well with WebMock. They use the request.authority when building the stub signature, which results in looking for the wrong host name. https://github.com/bblimke/webmock/blob/master/lib/webmock/http_lib_adapters/async_http_client_adapter.rb#L97-L106
Thanks, I'll take a look.