crow icon indicating copy to clipboard operation
crow copied to clipboard

Calling stop() does not properly close all active connections, namely websocket connections.

Open pfeatherstone opened this issue 5 years ago • 8 comments

Calling stop() does not properly close all active connections, namely websocket connections.

pfeatherstone avatar Jun 19 '20 15:06 pfeatherstone

So i now manually keep track of websocket connections, and before i call stop(), i iterate through all the connections, and call close(). Now it turns out the handler that is dispatched at line 6898 in crow_all.h never gets called. If you do a bit more tracking it goes into boost.asio, at which point I loose track of the stack trace. Can anybody help? I know this repo is dead, but just checking in case some users have encountered similar problems.

pfeatherstone avatar Jun 19 '20 16:06 pfeatherstone

I'll try to look into it if i have time. if you can provide some minimal example where the issue appears it would be great.

The-EDev avatar Oct 05 '20 05:10 The-EDev

Don't worry. I use a combination of https://github.com/yhirose/cpp-httplib and https://github.com/zaphoyd/websocketpp.git. I also sometimes use https://github.com/civetweb/civetweb though i get hanging issues with the last one.

pfeatherstone avatar Oct 05 '20 06:10 pfeatherstone

So I looked through the code (I know you said not to worry), It seems that stop() and close() do completely different things.

close() sends a close signal and once a confirmation is received, it closes the socket adaptor.

stop() on the other hand shuts down the IO services that all sockets use (HTTP request sockets or WebSockets). And while that works does work, the client just never receives any notice.

The-EDev avatar Oct 22 '20 07:10 The-EDev

This is indeed a problem if you stop() your server and restart it afterwards without much waiting. It will then fail to start with an exception:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
  what():  bind: Address already in use

Calling lsof gives you the reason:

$ lsof -i :18080
COMMAND     PID       USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
my-server 23405 jf99   44u  IPv4 149122      0t0  TCP localhost:18080->localhost:40806 (CLOSE_WAIT)

This is with reuse_addr = true in the acceptor's constructor (line 9269 in crow_all.h).

jf99 avatar Oct 30 '20 13:10 jf99

From what I've seen in the code (and I might be totally wrong), the primary issue is that Crow doesn't keep track of connections, (with the exception of HTTP connections in debug mode).

And the only solution I can think of is what @pfeatherstone was doing but in crow itself, which is keep track of all active connections and send a close signal right before stopping the server as part of stop().

The-EDev avatar Oct 30 '20 13:10 The-EDev

This is indeed a problem if you stop() your server and restart it afterwards without much waiting. It will then fail to start with an exception:

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
  what():  bind: Address already in use

Calling lsof gives you the reason:

$ lsof -i :18080
COMMAND     PID       USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
my-server 23405 jf99   44u  IPv4 149122      0t0  TCP localhost:18080->localhost:40806 (CLOSE_WAIT)

This is with reuse_addr = true in the acceptor's constructor (line 9269 in crow_all.h).

As far as I remember there're OS settings which affect the CLOSE_WAIT timeouts e.g. net.ipv4.tcp_fin_timeout and tcp_keepalive_time etc. so it might not always be the case if you tweak your OS for high concurrency and load - to keep these sockfds closed in a timeley manner. TYVM

sbenner avatar Oct 30 '20 13:10 sbenner

Like everything else in boost, it seems boost.asio is more complicated than what it needs to be.

pfeatherstone avatar Oct 30 '20 16:10 pfeatherstone