ndt-server
ndt-server copied to clipboard
ndt7: tracking sender errors
During our initial client migrations to ndt7, we found that under some conditions (unknown), clients may run for up to 11s (maybe more, but the client stopped at 11s) and the current ndt-server ndt7 prometheus metrics do not distinguish between cause of error. There is ambiguity about whether a write error is due to a problem or a clean shutdown by the remote client.
For example, one cause of write error on a websocket conn may be due to the remote client closing the connection cleanly. For example, conn.WriteJSON may return websocket: close sent from https://github.com/gorilla/websocket/blob/master/conn.go#L86
Other cases may be similarly benign. At best, we can probably separate server-visible errors as one of either:
- probable success
- probable errors
for clients.
@stephen-soltesz
Related to this:
- Consider prometheus labels for TestRate metric:
- include error status & run time label (<9, 9-13, >13)s