swindon
swindon copied to clipboard
Allow monitoring `connection refused` on individual destination
Currently we're getting this behavior:
2819 connect(13, {sa_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
2819 epoll_ctl(3, EPOLL_CTL_ADD, 13, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=2102, u64=2102}}) = 0
2819 epoll_wait(3, {}, 1024, 0) = 0
2819 epoll_wait(3, {{EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|EPOLLRDHUP, {u32=2102, u64=2102}}}, 1024, 176) = 1
2819 write(5, "\1", 1) = 1
2819 epoll_wait(3, {{EPOLLIN, {u32=4294967295, u64=18446744073709551615}}}, 1024, 0) = 1
2819 read(4, "\1", 128) = 1
2819 read(4, 0x7ffc81ffd8d0, 128) = -1 EAGAIN (Resource temporarily unavailable)
2819 getsockopt(13, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
2819 close(13)
Which is fine, except we can't see what specifically is wrong without running strace.
I'm not sure what the best fix is, either exposing this thing in !Status or showing it in metrics or logs, or the combination of all of them.
Update: Also we have queue-size-for-503 of the default 100k which is just too big both for our use case and as a default value.
Okay the first two things to do are:
- Expose metric for the size of the backend queue
- Cancel dropped (timed out) connections (to keep queue minimal)