goworker
goworker copied to clipboard
Goworker does not recover from loss of connection to redis
If the connection to redis goes down (due to redis restart, netsplit, ...) the library does not recover, but only logs the error:
2014-03-10 19:07:23,682 DEBG 'my-worker' stdout output:
1394474843680784649 [Error] Error on my-worker-1:14479-poller:my_queue getting job from [my_queue]: use of closed network connection
Goworker should recognize the redis client instance is dead and try reconnecting.
I modified GetConn
to check connection status and perform redial. Here's the changes:
func GetConn() (*RedisConn, error) {
resource, err := pool.Get()
if err != nil {
return nil, err
}
+
+ // Performs simple connection check. Redial if needed
+ _, err = resource.(*RedisConn).Do("PING")
+ if err != nil {
+ resource.(*RedisConn).Close()
+ resource, err = redisConnFromUri(uri)
+ if err != nil {
+ return nil, err
+ }
+ }
+
return resource.(*RedisConn), nil
}
Not sure if this is recommended way when working with vitess' pools.
I can make a PR if this changes are OK :)
I know this is an old thread but maybe this will help someone like me who found it when trying to handle disconnects.
This solution does not work because the connection does not get put back in the pool. When the pool tries to close it hangs on draining the chan
that contains the connections in the pool.
I opted to handle the error outside of goworker. It required a patch to get it to actually throw an error though. See my PR here #60
The main issue is that on the poller.poll
it would only return an error if it happens before the goroutine, once the goroutine is lunched Work
will always return nil
and just finish.
The only way I can think of having an error be returned as everything is running in goroutines and the main Work
is blocked with ah sync.WaitGroup
would be to have a make(chan error, 0)
be part of the internal struct on the poller.newPoller
so we can "just" <- poller.err
and have the WaitGroup be also in the select (with a goroutine and a channel that i'll be closed once it's done).
waitCh := make(chan struct{})
go func() {
monitor.Wait()
close(waitCh)
}()
select {
case <-waitCh:
case err := <-poller.error:
// quite to gracefully shut down
quit <- true
// Return the poller error
return err
}
All the possible errors from the workers are also related to Redis and and they'll be closed using the quite
(well they'll fail as there will be no connection to Redis but that's expected hehe).
Would this be ok to open a PR for?