sqlx-core pools optionally immediately return ConnectionRefused errors instead of hanging and returning PoolTimedOut
instead of hanging until the timeout and returning a PoolTimedOut error
Does your PR solve an issue?
fixes #744 this also would fix a bunch of duplicate / similar issues, didn't take the time to list them all.
Is this a breaking change?
following the last commit, no, it's a new options when creating the pool.
as mentioned by my comment in the issue :
if your db goes down you don't want users to have to wait a few seconds to get an error when they could get some 500 error immediately.
i understand that the pr may need some edits, but i wanted this issue to get some attention.
made it optional with the .return_con_refused() function that takes a bool.
this should no longer be a breaking change.
hey @abonander could this get a review ? thank you !
sorry to bug you @abonander but we are really waiting for this to get merged
@mehcode sorry for the bump but could this get a review ?
There's kind of two orthogonal issues here:
-
Pool::connect()et al should definitely return the connection error and not just eat it then eventually returnPoolTimedOut. That's a huge UX miss on our part, I'll readily admit that. I always thought we were at least logging these errors so you could see them, but clearly we're not. - You probably don't want to just immediately die on the first error, though. The Internet is messy and little hiccups happen all the time and are usually temporary. On startup, if your database server starts at the same time as your application, it's not going to be ready to accept connections right away. You don't want your deployment to be marked as failed just because of a little timing issue. And while running, if your database server does go down, you should ideally be having it run in high-availability mode with a hot standby ready to take over, in which case the next connection attempt should ideally succeed. I think most users would rather a request take a second and then succeed than immediately die with a 500.
I've been slowly working on #3582 which should bring massive improvements in this regard, including a separately settable connect_timeout. My goal is to have that make it into 0.9.0, at which point it would supercede this PR.
@abonander hey, thank you for the reply !
no, i don't necessarily want to die on the first error but i want to be able to manage it myself, there are services that if they can't connect first try, i don't want to have to wait for the whole timeout.
ie connecting to a local database when the database is down and i'm testing things, i rather have the service exit immediately than wait for no reason.
or i want to connect to another backup database if the first one failed, that kind of things.
there are also many case in which if the database actualy went offline, i want to return to the user immediately that there was an internal error and not make him wait for 30s.
i rather have the user retry than wait for nothing in many cases.
we know the connection failed, and thus i don't think that information should be hidden from the library user if he wants to be able to get it, no other databases library i know of does that.
either way, this behavior should be a choice, this is why i made it an option and i rather not have to maintain a fork just to be able to do that.
if i want to write my own retry and wait logic, or have none, i should be able to. also if the service gets a lot of throughput, i don't think it's a good idea to keep tons of open connections when they could just return a 5xx errors.
i want to return to the user immediately that there was an internal error and not make him wait for 30s.
You do know you can shorten the timeout, right? https://docs.rs/sqlx/latest/sqlx/pool/struct.PoolOptions.html#method.acquire_timeout
I wouldn't set it to exactly zero as that would break things, but you could set it to one second instead.
@abonander that's beside the point, i want no timeout not a small timeout (for con refused). also the timeout will affect acquire in general not just connection refused errors.
i may want to have a 15 to 30s timeout for some / most kind of network, latency or other issues / errors, because that may eventualy resolve, but have no timeout when the connection is refused.
we have the error, we should be able to deal with it manually if we want to, it should not just be swallowed or wait when it doesn't need to.
literaly no other sql engine that i know of does this. anyway, i'm fine with it not being the default behavior, but it has to at least be an option imo.
i rather not have to maintain a fork just for that trivial feature.
thank you for your understanding.
also that doesn't solve the issue of keeping many open connection to the http clients when they could immediately close because the database refused the connection.
2. You _probably_ don't want to just immediately die on the first error, though. ...
I also don't want my app to die on the first error without retrying, but that shouldn't be sqlx's responsibility. While convenient, this behavior should be optional. I prefer connect() to fail immediately on connection refused so my app, or even better, its orchestrator (e.g. kubernetes) takes care of retrying and exponential backoffs.
@ateijelo exactly, also the connection failing doesn't mean your app has to die, you can still use ? and match the error higher up the call stack then choose what you want to do from there instead of some exponential backoff, i rather be able to react to the connection refused error immediately than have a backoff that eventualy fail.
@abonander it's a tiny change, and it's entirely optional and would not be enabled by default, can't this or some version of this just be merged so that those that want the feature can be happy, having to maintain a fork for such a little change is definitely not the way i'd like to go forward.
if it's no longer compatible with upstream i'd be willing to redo the pr if you tell me that it'll be merged.