ndt-server icon indicating copy to clipboard operation
ndt-server copied to clipboard

ndt7: tracking sender errors

Open stephen-soltesz opened this issue 5 years ago • 2 comments

During our initial client migrations to ndt7, we found that under some conditions (unknown), clients may run for up to 11s (maybe more, but the client stopped at 11s) and the current ndt-server ndt7 prometheus metrics do not distinguish between cause of error. There is ambiguity about whether a write error is due to a problem or a clean shutdown by the remote client.

For example, one cause of write error on a websocket conn may be due to the remote client closing the connection cleanly. For example, conn.WriteJSON may return websocket: close sent from https://github.com/gorilla/websocket/blob/master/conn.go#L86

Other cases may be similarly benign. At best, we can probably separate server-visible errors as one of either:

  • probable success
  • probable errors

for clients.

stephen-soltesz avatar Jul 23 '20 01:07 stephen-soltesz

@stephen-soltesz

laiyi-ohlsen avatar Aug 10 '20 16:08 laiyi-ohlsen

Related to this:

  • Consider prometheus labels for TestRate metric:
    • include error status & run time label (<9, 9-13, >13)s

stephen-soltesz avatar Aug 11 '20 16:08 stephen-soltesz