nighthawk icon indicating copy to clipboard operation
nighthawk copied to clipboard

add dedicated counters for a few common 4xx and 5xx codes

Open eric846 opened this issue 2 years ago • 3 comments

Can we afford to add 5 or 15 counters to help troubleshoot these specific HTTP outcomes? These would be in addition to today's catch-all http_4xx and http_5xx counters.

Usually if I saw 4xx or 5xx errors in Nighthawk counters, I would just use curl against the server directly to see what's happening, but when using a custom transport socket, that's impossible.

If the resource cost is significant, we should prioritize the most common counters.

If we can afford 15:

  • 400 Bad Request
  • 401 Unauthorized
  • 403 Forbidden
  • 404 Not Found
  • 405 Method Not Allowed
  • 406 Not Acceptable
  • 407 Proxy Authentication Required
  • 408 Request Timeout
  • 429 Too Many Requests
  • 500 Internal Server Error
  • 501 Not Implemented
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout
  • 505 HTTP Version Not Supported

If we can only afford 5:

  • 404 Not Found
  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable
  • 504 Gateway Timeout

eric846 avatar Apr 26 '22 20:04 eric846

We can certainly add more counters. We can limit the impact of this addition by hiding these changes behind a feature flag, so that we don't change the default behavior.

We could run some larger load tests to determine the impact and feasibility of this which could help use decide whether we add 5 or 15.

@eric846 is this something you are planning to work on?

mum4k avatar Apr 27 '22 14:04 mum4k

All sounds good. (I'm not planning to work on it myself.)

eric846 avatar Apr 27 '22 23:04 eric846

I just realized a way to reduce the effort.

We can just let the user specify a list of HTTP codes they want to break out as separate counters. Then we aren't even bound by the set of 15. The default would be an empty list, and I would probably start off with 404,500,502,503,504 myself. For debugging where performance doesn't matter, someone could try $(seq -s , 200 599).

eric846 avatar May 02 '22 00:05 eric846