coio-rs icon indicating copy to clipboard operation
coio-rs copied to clipboard

Limiting the number of TCP connections

Open jedisct1 opened this issue 8 years ago • 19 comments

Hi Zonytoo,

I'd like to use coio in a public-facing service accepting anonymous TCP connections. If only to avoid file descriptors exhaustion, there has to be a limit on the maximum amount of open connections. While just closing a socket after having accepted it if we get close to the limit is an option, a better practice is to close the oldest connection instead. However, I didn't see any obvious ways to do this when using coio. How would you do TCP reuse?

jedisct1 avatar Jan 19 '16 08:01 jedisct1

Hi!

I'm not quite sure what you'd like to achieve... Are you trying to artificially limit the number of open TCP connections to a safe value (by forcibly closing old ones), or are you trying to circumvent the problem of sockets in the TIME_WAIT state?

Solving the latter case is quite easy because you can use TCP_TW_REUSE to control this. The former case is a bit more complicated though. Reason being that coio uses suspend-down and thus suspends the execution within the calls to read(), write(), etc. instead of doing it outside of those using suspend-up (with something like the await syntax in C# and so on). Calling read() for instance will block seemingly just like any other actually blocking, traditional API. Thus the question becomes: "How do I break out of the read() call?" and while using e.g. nix::close(sock.as_raw_fd()); is possible, I believe there is currently no way to do this in coio yet, because this won't wake up the event loop (and thus the "read()" call won't return).

Maybe I'm wrong though and @zonyitoo has a better idea, but I'm currently working on an coio overhaul anyways, because of a similiar issue (see #22). This will take another ~2 months though, because I'm currently in the middle of learning for my exams...

lhecker avatar Jan 19 '16 09:01 lhecker

Hmm, what would you do if you are using normal C APIs?

zonyitoo avatar Jan 19 '16 10:01 zonyitoo

If you use shutdown(socket_fd, SHUT_RDWR); you can let blocking read()/write() calls return with a 0 return value. The remaining problem here is that this won't wake up the event loop, I think... (Meaning: I don't believe that this creates an EventSet::hup() event.)

lhecker avatar Jan 19 '16 10:01 lhecker

In C, I would indeed close() the oldest descriptor right after an accept().

jedisct1 avatar Jan 19 '16 10:01 jedisct1

Ah, I see. So if one TcpStream is waiting for events, then you cannot do anything on it until it got some events right? In this case, this is not coio can do, it is because Rust's ownership system.

Calling close() on an fd is equivalent to drop the object of TcpStream. So if you can find a way to drop it, then you can close it. But ...

  • A TcpStream is now owned by somebody (A), and A is stuck in I/O, so no one will get the chance to control this TcpStream, even you want to call close() on it.

You may try some ways in below:

  1. Use try_clone to dup a new handle (fd) of a TcpStream, then you can call shutdown on this stream. But this way will need to consume one more fd.
  2. Use Arc to share the TcpStream between threads (coroutines).
  3. Use raw pointer to do your dirty things. :)

zonyitoo avatar Jan 19 '16 10:01 zonyitoo

I don't think your 3 solutions will work, because if a coroutine is still stuck in I/O it will never anymore be woken up after you called close() or shutdown() in another coroutine or thread. So I personally believe that this cannot be done currently until #22 is solved (I've updated that issue accordingly). In the meantime you might want to drop new connections instead (i.e. by not calling accept()). While this would not be an optimal solution it might be acceptable for the next few months until #22 is resolved...

lhecker avatar Jan 19 '16 11:01 lhecker

Indeed, close() and shutdown() do wake up coroutines. #22 seems indeed to be the only way to handle this.

Dropping new connections makes servers trivially vulnerable to DoS attacks, while closing previous connections somewhat mitigates them by giving a chance for some legitimate queries to get in. This is one of the things all DNS resolvers and servers are doing in order to mitigate the constant attacks they are facing.

jedisct1 avatar Jan 19 '16 18:01 jedisct1

I have implemented an experimental version to solve this problem, please check this test.

This would provide a way to set a timeout time for an I/O object.

zonyitoo avatar Jan 24 '16 08:01 zonyitoo

This is awesome!

jedisct1 avatar Jan 24 '16 08:01 jedisct1

Still in experimental stage, please ensure it works well

zonyitoo avatar Jan 24 '16 09:01 zonyitoo

Timeouts (issue #22) will surely help in making sure that clients do not use something like the Slowloris attack, but does this really solve this issue? Because this issue needs a solution for deterministically closing old connections (even if they are in use).

Also - and I think that's the "sad" part: As you know, @zonyitoo, I've been working on the eventloop-overhaul and large parts of your branch won't work with this (because I'm going to remove oneshot polling due to the huge performance impact it has). :confused:

lhecker avatar Jan 24 '16 09:01 lhecker

The only thing you have to ensure is API compatibility, so you can surely reimplement all the internal in coio as you wish.

Yes, removing oneshot polling is a good idea to improve performance, but this will surely add complexity in the implementation. You could just ignore my branch, it just an experimental implementation to express my idea.

I haven't seen another facilities to dump Slowloris attack. To prevent server from being overload, one solution is to add fast-reject ability. The server will close the accepted connection immediately after accepting when it is already overloaded. This fast-reject policy won't need to be implemented in coio, it could be implemented in any server implementation.

zonyitoo avatar Jan 24 '16 10:01 zonyitoo

Yeah this "fast-reject policy" is what I would have done too. But since @jedisct1 brought up this issue I've noticed how flawed this is... Because if an attacker manages to open a large number of connections to your server until it becomes exhausted, new connections will be dropped. Thus you've created a system which easily succumbs to a DoS. On the other hand closing old connections is also bad, because if an attacker manages to rapidly create new connections to your server old ones (which might be from good users) will be dropped too, before they can do anything meaningful. Thus you are a target for DoS... again. But I believe that the latter option is the better one, because it recovers faster from such attacks and exploiting it for a extended period of time is probably a bit harder, etc.

In the end I think that this issue might take some time... The problem is that coio uses and should probably continue using suspend-down coroutines, although they make awaiting 2 different async. things (like for instance simultaneously awaiting a "close" signal through a channel and a read() result) really hard.

Although I have an idea how to concurrently abort read/writes from another coroutine it's rather... "simple". So... For the time being I think coio will probably only have read/write timeouts and a "real" solution for this will come later.

lhecker avatar Jan 24 '16 10:01 lhecker

Well, looking forward to see that. I don't think there is a better solution for the two DoS attack situations, which are: large number of connections and rapidly create new connections. I am very curious about the solution in those commercial servers.

zonyitoo avatar Jan 24 '16 12:01 zonyitoo

@lhecker BTW, I think we should keep this crate simple and do what it have to be done. I am also planning to build a server framework with coio.

zonyitoo avatar Jan 24 '16 13:01 zonyitoo

Yeah of course! (IMHO we should even take coroutines out of coio, since those are also useful without the I/O... What's the difference between this and coroutine-rs? Why not merge both efforts into one project?)

lhecker avatar Jan 24 '16 13:01 lhecker

coroutine-rs is my first project, but soon I found that it is very useless to use coroutine without I/O. Also it is very hard to separate coroutine's basic facilities outside a coroutine's scheduler. So I separate the most important part to context-rs. You can easily build a coroutine library with the context-rs.

zonyitoo avatar Jan 24 '16 13:01 zonyitoo

Hey @jedisct1! Just a heads up: We developed a "coroutine" barrier that works similiar to the well known thread barriers. We are (more than likely) going to use those to wait for the completion of async. I/O in the future. We could add an API that would allow you to set a "close" flag on a socket and signal all barriers of that socket so all I/O operations return immediately. If the I/O method sees that "close" flag it could return an matching io::Error for which you could check in the read or write loop of your socket and break out of it.

lhecker avatar Feb 20 '16 10:02 lhecker

This is fantastic news!

jedisct1 avatar Feb 20 '16 11:02 jedisct1