mobc icon indicating copy to clipboard operation
mobc copied to clipboard

mobc completely stops serving connections.

Open garrensmith opened this issue 2 years ago • 3 comments

Hey,

we use mobc as part of Prisma and we getting into a situation where mobc complete stops serving any connections. If I create a HTTP server using hyper and create a mobc pool via Quaint.

A repo with the reproduction can be found here https://github.com/garrensmith/mobc-error-example

I then use apache benchmark with a request like this:

ab -v 4 -c 200  -t 120 http://127.0.0.1:4000/

Once apache benchmark has stopped. The connections in postgres go to either to a much lower than the original number of connections I've set to open or completely to zero. If I log State from Mobc it will report it has 10 active connections. Which is incorrect. However if I try and start apache benchmark and run it again, it will either run a lot slower and with fewer connections. Or not run at all because it cannot acquire a new connection from mobc.

I've tried a few things in the code but I cannot see why this is happening. I even tried https://github.com/importcjj/mobc/pull/60 but that didn't fix it.

Any help would be really appreciated.

garrensmith avatar Feb 20 '22 13:02 garrensmith

hi @importcjj have you had a chance to look at this issue. Any ideas or suggestions I can look at?

garrensmith avatar Feb 25 '22 07:02 garrensmith

I've been diving into this a bit more and I now understand why Mobc can reach a point of dropping connections and deadlocking.

The issue is happening over here https://github.com/importcjj/mobc/blob/master/src/lib.rs#L664 First some context, in our situation, we can have a lot of concurrent requests (over 1000) for a connection from the connection pool. The connection pool will only have a small number of connections for example 10. All the waiting requests have a oneshot channel created, with the Sender added to a Queue over here https://github.com/importcjj/mobc/blob/master/src/lib.rs#L542 What can happen then is that all those waiting requests are destroyed, this happens in the case when those connection requests are coming from web requests that have been aborted. So now the conn_requests queue has over 1000 channel Senders to Receivers that have been dropped. Now when an active connection is returned to the pool, what is supposed to happen is that mobc will go through the list of Senders and try and send the connection to a waiting Receiver. And if the Receiver has been cancelled or dropped, the connection is returned and another Sender is tried until it finds a Sender with a waiting Receiver. This is the code I mentioned earlier https://github.com/importcjj/mobc/blob/master/src/lib.rs#L664

However, when there are a large number of Receivers that have been dropped, this doesn't seem to work and the connection gets accidentally dropped. The internals.num_open is not decremented at this point. So Mobc thinks it still has active connections when in fact it does not. So it doesn't create new connections or have any connections to pass to any new connection requests.

I have an idea to solve this. But it would involve replacing the channels with a Semaphore. This would be similar to how deadpool works https://github.com/bikeshedder/deadpool/ I've tested it and it works with Mobc. But it would be quite a large change.

The reason the move to a Semaphore would be better is that when a connection is returned to the pool, there is no chance of it being dropped. It would be added to the list of free_conns. The oneshot channels will be replaced waiting for access to the semaphore. So if the request is cancelled, another request can grab the connection and there is no chance of it being accidentally lost.

garrensmith avatar Mar 07 '22 08:03 garrensmith

To conclude this in case someone else comes across this. We are hosting a forked Mobc with fixes for this issue over here https://github.com/prisma/mobc

garrensmith avatar Mar 21 '22 08:03 garrensmith

Hello,

We are experiencing exactly same issues on Prisma connection pool.

The application is a backend api developed with NestJS.

Could someone please explain how to implemented this fix with mobc please ?

Thanks.

w8ze-devel avatar Oct 23 '22 16:10 w8ze-devel

@w8ze-devel can you open a ticket on the prisma repository to track this.

garrensmith avatar Nov 16 '22 12:11 garrensmith

The latest 0.8.1 release fixes this.

garrensmith avatar Jan 22 '23 08:01 garrensmith