Pub/Sub socket stats: dropped messages due to full buffers
We need some stats for tracking the number of lost messages due to full buffers inside of PubSocket and SubSocket.
@mempirate down to take this on!
This is an interesting one. Although there is a stats impl for tracking:
https://github.com/chainbound/msg-rs/blob/26492f307e4720a08dbdef5fe51785bbd14c1e3d/msg-socket/src/pub/socket.rs#L162-L165
Using metrics we could do (pseoducode):
// Broadcast the message directly to all active sessions.
if self.to_sessions_bcast.as_ref().ok_or(PubError::SocketClosed)?.send(msg).is_err() {
counter!("msg_socket_pub_socket_lost_messages_total").increment(1);
debug!("No active subscriber sessions");
}
what do you think of metrics or prometheus crate?
CC: @mempirate
I think I wouldn't want to have internal Prometheus metrics, the idea of the stats is that any consumer of this library can periodically call it to update their own metrics with whatever stack they're using! Would like this to be as lightweight as possible.
Thanks for clarifying, so expanding on SocketStats should be enough. what do you think of this:
pub(crate) fn increment_lost_messages(&self) {
self.lost_messages.fetch_add(1, Ordering::Relaxed);
}
Usage:
// Broadcast the message directly to all active sessions.
if self.to_sessions_bcast.as_ref().ok_or(PubError::SocketClosed)?.send(msg).is_err() { // pattern match this to use `SendError`
this.state.stats.increment_lost_messages();
debug!("No active subscriber sessions");
}
E.g 'lost_messages' in the sense that no receivers on the other end at the time publisher broadcasted or because they were dropped.
What I had in mind was actually some metrics here (i.e. on the receiving side of the socket):
https://github.com/chainbound/msg-rs/blob/87d105ba824ae8b456dd7c665388f7fe685e492d/msg-socket/src/sub/driver.rs#L356-L364
I don't think we need stats on dropped sends if there are no active sessions.