skunk icon indicating copy to clipboard operation
skunk copied to clipboard

Automatic reconnection

Open gvolpe opened this issue 5 years ago • 8 comments

Right now, if the connection to the server is lost (i.e. restart PostgreSQL server), the application needs to be restarted to get reconnected. I think this would be a nice feature to have :)

Here's the exception I get when I stop and then start the PostgreSQL server:

root[ERROR] java.lang.Exception: Fatal: EOF before 5 bytes could be read.Bytes
root[ERROR] 	at skunk.net.BitVectorSocket$$anon$2.$anonfun$readBytes$1(BitVectorSocket.scala:72)
root[ERROR] 	at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:139)
root[ERROR] 	at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:355)
root[ERROR] 	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:376)
root[ERROR] 	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:316)
root[ERROR] 	at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:136)
root[ERROR] 	at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:355)
root[ERROR] 	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:376)
root[ERROR] 	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:316)
root[ERROR] 	at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:136)
root[ERROR] 	at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:355)
root[ERROR] 	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:376)
root[ERROR] 	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:316)
root[ERROR] 	at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
root[ERROR] 	at cats.effect.internals.PoolUtils$$anon$2$$anon$3.run(PoolUtils.scala:51)
root[ERROR] 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
root[ERROR] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
root[ERROR] 	at java.lang.Thread.run(Thread.java:748)

gvolpe avatar Oct 27 '19 21:10 gvolpe

I'm not too worried about that with single-use connections but I think I will probably remove that ability and only allow pooled connections. They will have a health check when they're returned to the pool but it might make sense to do a liveness check again before they're handed out to handle this case.

tpolecat avatar Oct 27 '19 22:10 tpolecat

FWIW this happened using a pool of connections, but maybe that's not fully implemented yet? I recall seeing some TODO comments around this code.

gvolpe avatar Oct 28 '19 03:10 gvolpe

maybe that's not fully implemented yet

Yeah that pooling code is bad. I have a WIP for a new implementation but it doesn't have the pre-checkout phase yet.

Also to a first approximation nothing is fully implemented! I mean, you discovered that INSERT wasn't supported yet because nobody had ever tried it ;-)

tpolecat avatar Oct 28 '19 13:10 tpolecat

No worries :smile:

I'll leave this issue open hoping it gets fixed someday, no rush :) Once your new pooling code is out, I can give it a try and maybe come back with a PR if it's not yet fixed.

gvolpe avatar Oct 28 '19 22:10 gvolpe

I've also encountered this issue.

An easy to use reproduction can be found in this repo, including the necessary scripts to start/restart the PSQL in docker. N.B. it only reproduces w/ pooled connections.

Now, I've hacked together quite a nice way to get pool reinitialization in userland, without full app restart (ping @gvolpe if you need it). But it's not a fully tested in terms of all the concurrency problems that can arise, but it works in production :joy:. A bunch of Ref, Semaphore and Resource.allocated :grimacing:. And not generic enough to handle restart when you wanted a stream in your query. But in case anyone wants it, it can be found in the same repo

@tpolecat, I'll gladly take a look at solving this problem in skunk itself if this is wanted. As the solution here is much simpler than the userland pool reinit hack.

lorandszakacs avatar Mar 09 '21 11:03 lorandszakacs

This should be fixed. Have you tried this with https://github.com/tpolecat/skunk/pull/388 ?

tpolecat avatar Mar 09 '21 15:03 tpolecat

Nice, @tpolecat please feel free to close this, I completely forgot about this issue! :smile:

gvolpe avatar Mar 09 '21 16:03 gvolpe

@tpolecat if that fix wound up in 0.0.24 then I did try it. And it's not fully fixed, not for the particular usecase in the repo I had linked.

lorandszakacs avatar Mar 09 '21 17:03 lorandszakacs