sozu
sozu copied to clipboard
Drop old connections when the accept queue is full
If the accept queue is full, new connections can't be accepted. Could we close the least recently used connections to make some space for those new connections?
Maybe that could be behind a configuration flag. The default would be to close the oldest connections.
In regard of #643 we would still need to send a metric point when this happens.
right, this should be done soon
Related issues:
- #643 (resolved)
- #518
- #497 (closed)
Digging in the code, we found:
- the size of the
accept_queue
is directly limited by the size of the slab that contains sessions. Accepting a connection in the queue, we need a token, i.e. an index of the slab, which implies there is free space in the slab. - at each connection added into or removed from the slab, the size of the slab is checked
This behaviour is normal and need no change. If there are too many connections in the slab, you may increase max_connections
or reduce accept_queue_timeout
.
The issue was about closing unused connections in the event max_connections
is hit. If it is, I think it would be better to free up some unused connections that are in keep-alive mode and doing nothing. This will allow accepting new connections again. Reducing the accept_queue_timeout
will just increase the number of connections that are actually dropped if max_connections
has been reached. Also, max_connections
can't be changed at runtime (and we probably don't really want to).
@FlorentinDUBOIS should we reopen the issue then?
According to @Geal in the pull request #714, we have some changes to do related to the issue. So yes, we should reopen it.
This should have a config option : if someone tried to create a lot of new connections to kill the server, this option would drop inactive but legitimate connections, we might not want that depending on the kind of server
First, we should implement a "garbage collector" on keep-alive connections which are not in use when the number of active connections has reached the max_connections
number. This garbage collector will close dozen of connections (that could be configurable), we do not need to close all connections in one strike.
Secondly, we could put more intelligence in the garbage collector (gc) like @Geal asked. The gc may check incoming ip addresses and throttle the sessions opened by those ip addresses to mitigate attacks from any individual ip address.
The garbage collector analogy is good because there's similar behaviour: short lived connections that recycle often, and long lived connections with infrequent requests. There could be a kind of score depending on the idle time, the number of valid requests that went through, the total age of the connection, etc. The thorny issue here is that this score can be abused for a DoS. Example: if you know the score favors connections with lots of requests, it is easy to generate connections with lots of small requests, slowly pushing out the older ones