bandit icon indicating copy to clipboard operation
bandit copied to clipboard

Could not determine a protocol - when running on EC2

Open samfrench opened this issue 1 year ago • 5 comments

Hi, just exploring this project to see if it's something we're able to use in a running application and seems ideal for performance. One thing that I haven't been able to work out is AWS status checks when deploying this to an EC2, as our logs fill with messages about not recognising the protocol.

{"report_cb":"&:gen_server.format_log/2","error_logger":{"tag":"error","report_cb":"&:gen_server.format_log/1"},"crash_reason":["Could not determine a protocol",[]],"metadata":{},"message":"GenServer #PID<0.1566.0> terminating\n** (stop) \"Could not determine a protocol\"\nLast message: {:continue, :handle_connection}\nState: {%ThousandIsland.Socket{acceptor_id: \"DA364D26BD40\", connection_id: \"302D04F1381E\", read_timeout: :infinity, socket: #Port<0.533>, transport_module: ThousandIsland.Transports.TCP}, %{handler_module: Bandit.InitialHandler, plug: {Web.Router, []}, read_timeout: 60000}}","level":"error","datetime":"2022-08-05T08:51:19.327125Z"}

I was seeing this error continuously in our application logs. I tried to trace the handler functions and when creating sockets, but the only thing I could see was a request I made got into "handle_data" in Bandit but the EC2 status checks didn't. I tried "handle_connection" but couldn't find anything common between them.

Any suggestions with this, or pointers in getting further with the tracing are appreciated.

This may be an issue for ThousandIsland and not Bandit, so happy to raise there if this is the case. The message was showing me the Bandit handler but ThousandIsland socket and looks to be from Bandit:

https://github.com/mtrudel/bandit/blob/main/lib/bandit/initial_handler.ex#L28

samfrench avatar Sep 13 '22 08:09 samfrench

This will be a Bandit error, so this is the right place to file it. If it's possible to do so, could you turn logging up to trace and paste it here? You'll want to get the following to run at some point before the relevant request comes in:

ThousandIsland.Logger.attach_logger(:trace)

mtrudel avatar Sep 13 '22 14:09 mtrudel

This has produced more output as expected but doesn't seem to suggest what I should look into further. Hopefully this is the information you need. Each status check had the same output with different "remote_address" and "acceptor_id" values.

{"metadata":{},"message":"[:acceptor, :accept] metadata: %{acceptor_id: \"4384F841363C\", server_config: %ThousandIsland.ServerConfig{genserver_opts: [], handler_module: Bandit.DelegatingHandler, handler_opts: %{handler_module: Bandit.InitialHandler, plug: {Web.Router, []}, read_timeout: 60000}, num_acceptors: 100, port: 8000, read_timeout: :infinity, transport_module: ThousandIsland.Transports.TCP, transport_opts: []}}, measurements: %{startup_time: 42428, wait_time: 20910317309}","level":"debug","datetime":"2022-09-16T07:11:46.525496Z"}
{"metadata":{},"message":"[:handler, :start] metadata: %{acceptor_id: \"4384F841363C\", connection_id: \"C415595B35EB\", remote_address: {10, 11, 12, 13}, remote_port: 19200}, measurements: %{}","level":"debug","datetime":"2022-09-16T07:11:46.525613Z"}
{"metadata":{},"message":"[:socket, :handshake] metadata: %{connection_id: \"C415595B35EB\"}, measurements: %{}","level":"debug","datetime":"2022-09-16T07:11:46.525672Z"}
{"metadata":{},"message":"[:socket, :recv] metadata: %{connection_id: \"C415595B35EB\"}, measurements: %{result: {:error, :closed}}","level":"debug","datetime":"2022-09-16T07:11:46.525725Z"}
{"metadata":{},"message":"[:socket, :close] metadata: %{connection_id: \"C415595B35EB\"}, measurements: %{octets_recv: 0, octets_sent: 0, packets_recv: 0, packets_sent: 0}","level":"debug","datetime":"2022-09-16T07:11:46.525771Z"}
{"metadata":{},"message":"Handler C415595B35EB error \"Could not determine a protocol\"","level":"error","datetime":"2022-09-16T07:11:46.525820Z"}
{"report_cb":"&:gen_server.format_log/2","error_logger":{"tag":"error","report_cb":"&:gen_server.format_log/1"},"metadata":{},"message":"GenServer #PID<0.8108.0> terminating\n** (stop) \"Could not determine a protocol\"\nLast message: {:continue, :handle_connection}\nState: {%ThousandIsland.Socket{acceptor_id: \"4384F841363C\", connection_id: \"C415595B35EB\", read_timeout: :infinity, socket: #Port<0.5026>, transport_module: ThousandIsland.Transports.TCP}, %{handler_module: Bandit.InitialHandler, plug: {Web.Router, []}, read_timeout: 60000}}","level":"error","datetime":"2022-09-16T07:11:46.525906Z"}

samfrench avatar Sep 16 '22 07:09 samfrench

Here's what's happening:

1 (inferred from line 1): At some time prior, your server starts on clear TCP port 8000, using Web.Router as its Plug. 2. (lines 1-3): Client connects, Thousand Island acceptor 4384F841363C starts handler process C415595B35EB. Note line 3 (handshaking) is a no-op in clear TCP. 3. (line 4): Server tries to receive the first 24 bytes of data on the socket in order to determine the protocol in use, but finds the connection to have been closed by the client. An {:error, :closed} tuple is returned from the sniff_wire/1 function. 4. Bandit.InitialHandler's handle_connection/2 function returns {:error, "Could not determine a protocol", state} up to Thousand Island. 6. Thousand Island takes that return value and stops the connection process with a reason of "Could not determine a protocol". 7. (lines 5-6): Thousand Island closes the connection. This emits an error telemetry.

Note that all of the timestamps are within a few ms of one another, so this isn't a matter of the client timing out waiting for us to negotiate; in fact it's not giving the server any time at all to respond before it closes the connection. In short, I don't think the client is sending HTTP at all here (and I don't think it's TLS HTTP either due to the fast close).

I can't see where this is a Bandit issue if the client is misbehaving so badly. Do you have any documentation about what the client is doing here? What service are you using that's sending these health checks?

mtrudel avatar Sep 16 '22 14:09 mtrudel

Inspired by the inconsistency of the logging on line 6 of your trace above, I've updated the Thousand Island telemetry log helper format to be consistent across all log levels (https://github.com/mtrudel/thousand_island/commit/29da75a211d617af00bf5f411536b01b35ae875c). It'll go out in the next release sometime in the next few days

mtrudel avatar Sep 20 '22 13:09 mtrudel

Above change released in Thousand Island 0.5.10

mtrudel avatar Sep 22 '22 02:09 mtrudel

For the load balancer health checks the protocol being used is TCP which is the default. It seems this can be created with HTTP instead, but assumed they would behave the same.

I was looking at it from the point of view that Cowboy seems to respond at the moment whereas Bandit has the protocol messages seen in the logs.

According to the AWS documentation these health checks use TCP pings which contain no data packets.

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html

samfrench avatar Sep 23 '22 15:09 samfrench

There's your answer - Bandit is unable to determine a protocol because the client isn't sending HTTP. I can't speculate on why Cowboy doesn't log this, but it does the same thing (namely, not calling a Plug or considering a HTTP request to have been made). Both servers are doing the right thing here from a protocol perspective, it's just that Bandit is more verbose.

I'd suggest in this case that you configure AWS to send HTTP health checks (if any of those raise errors in Bandit then that's certainly a bug).

mtrudel avatar Sep 23 '22 16:09 mtrudel