nighthawk
nighthawk copied to clipboard
add dedicated counters for a few common 4xx and 5xx codes
Can we afford to add 5 or 15 counters to help troubleshoot these specific HTTP outcomes? These would be in addition to today's catch-all http_4xx
and http_5xx
counters.
Usually if I saw 4xx or 5xx errors in Nighthawk counters, I would just use curl
against the server directly to see what's happening, but when using a custom transport socket, that's impossible.
If the resource cost is significant, we should prioritize the most common counters.
If we can afford 15:
- 400 Bad Request
- 401 Unauthorized
- 403 Forbidden
- 404 Not Found
- 405 Method Not Allowed
- 406 Not Acceptable
- 407 Proxy Authentication Required
- 408 Request Timeout
- 429 Too Many Requests
- 500 Internal Server Error
- 501 Not Implemented
- 502 Bad Gateway
- 503 Service Unavailable
- 504 Gateway Timeout
- 505 HTTP Version Not Supported
If we can only afford 5:
- 404 Not Found
- 500 Internal Server Error
- 502 Bad Gateway
- 503 Service Unavailable
- 504 Gateway Timeout
We can certainly add more counters. We can limit the impact of this addition by hiding these changes behind a feature flag, so that we don't change the default behavior.
We could run some larger load tests to determine the impact and feasibility of this which could help use decide whether we add 5 or 15.
@eric846 is this something you are planning to work on?
All sounds good. (I'm not planning to work on it myself.)
I just realized a way to reduce the effort.
We can just let the user specify a list of HTTP codes they want to break out as separate counters. Then we aren't even bound by the set of 15. The default would be an empty list, and I would probably start off with 404,500,502,503,504
myself. For debugging where performance doesn't matter, someone could try $(seq -s , 200 599)
.