go-redis
go-redis copied to clipboard
context deadline not respected if backends are down
Expected Behavior
When a context deadline is provided, and all backends are down, calls to ClusterClient.Get (and others) return shortly by the deadline or shortly after, with an error.
Current Behavior
Calls to ClusterClient.Get block for N * D, where N is the number of backends configured, and D is the DialTimeout.
Steps to Reproduce
package main
import (
"context"
"log"
"time"
"github.com/go-redis/redis/v8"
)
func main() {
rdb := redis.NewClusterClient(&redis.ClusterOptions{
Addrs: []string{
"203.0.113.91:999",
"203.0.113.92:999",
"203.0.113.93:999",
"203.0.113.94:999",
"203.0.113.95:999",
"203.0.113.96:999",
},
DialTimeout: time.Second,
ReadTimeout: time.Second,
WriteTimeout: time.Second,
})
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
resp, err := rdb.Get(ctx, "example").Result()
if err != nil {
log.Fatal(err)
}
log.Print(resp)
}
$ time go run main.go
dial error: dial tcp 203.0.113.96:999: i/o timeout
dial error: dial tcp 203.0.113.95:999: i/o timeout
dial error: dial tcp 203.0.113.93:999: i/o timeout
dial error: dial tcp 203.0.113.94:999: i/o timeout
dial error: dial tcp 203.0.113.96:999: i/o timeout
dial error: dial tcp 203.0.113.91:999: i/o timeout
2022/07/06 11:31:31 context deadline exceeded
exit status 1
real 0m6.532s
user 0m0.925s
sys 0m0.281s
Expected: real time is less than 500ms, based on the 100ms context timeout. Actual: real time is 6.5s.
Note: the dial errors are present because I added a Println call to go-redis' dial code to make it clearer what's going on.
Context (Environment)
go version go1.18 linux/amd64 github.com/go-redis/redis/v8 v8.11.5
This is a simplified version of a problem we encountered when bringing up the OCSP responder for https://github.com/letsencrypt/boulder/ with a set of backends that could not be reached.
Possible Implementation
It's reasonable to keep trying to dial in the background, but the user-facing call should return based on the deadline.
Here's a stack trace of where the dial calls are coming from. Notably one of the stack frames is in cmdsInfo, which loops through the available nodes without checking for a context deadline each time: https://github.com/go-redis/redis/blob/9f5aacde23f23a9c45b59b675680e7d375c27de2/cluster.go#L1563-L1588
This is particularly an issue because cmdsInfo is called within an internal.Once.Do, which holds a mutex. And mutexes can't obey context deadlines, so all other goroutines block on that same mutex, for longer than their intended deadline.
goroutine 1 [running]:
runtime/debug.Stack()
/home/jsha/go1.18/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
/home/jsha/go1.18/src/runtime/debug/stack.go:16 +0x19
github.com/go-redis/redis/v8/internal/pool.(*ConnPool).dialConn(0xc0000b6280, {0x68fe48, 0xc0000a2000}, 0x1)
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/internal/pool/pool.go:199 +0xc6
github.com/go-redis/redis/v8/internal/pool.(*ConnPool).newConn(0xc0000b6280, {0x68fe48?, 0xc0000a2000?}, 0x1)
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/internal/pool/pool.go:163 +0x6e
github.com/go-redis/redis/v8/internal/pool.(*ConnPool).Get(0xc0000b6280, {0x68fe48, 0xc0000a2000})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/internal/pool/pool.go:279 +0x16b
github.com/go-redis/redis/v8.(*baseClient)._getConn(0xc0000d01a0, {0x68fe48, 0xc0000a2000})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/redis.go:194 +0x38
github.com/go-redis/redis/v8.(*baseClient).getConn(0xc0000d01a0, {0x68fe48?, 0xc0000a2000?})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/redis.go:182 +0x69
github.com/go-redis/redis/v8.(*baseClient).withConn(0xc0000d01a0, {0x68fe48?, 0xc0000a2000}, 0xc00009c780)
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/redis.go:274 +0x73
github.com/go-redis/redis/v8.(*baseClient)._process(0xc0000d01a0, {0x68fe48, 0xc0000a2000}, {0x6909d0?, 0xc0000b4190}, 0x0?)
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/redis.go:329 +0x114
github.com/go-redis/redis/v8.(*baseClient).process(0xc0000d01a0, {0x68fe48, 0xc0000a2000}, {0x6909d0, 0xc0000b4190})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/redis.go:311 +0x79
github.com/go-redis/redis/v8.hooks.process({{0x0?, 0x10?, 0xc0000cf960?}}, {0x68fe48?, 0xc0000a2000?}, {0x6909d0, 0xc0000b4190}, 0x50?)
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/redis.go:54 +0x7a
github.com/go-redis/redis/v8.(*Client).Process(0xc0000cf9a8?, {0x68fe48?, 0xc0000a2000?}, {0x6909d0?, 0xc0000b4190?})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/redis.go:596 +0x70
github.com/go-redis/redis/v8.cmdable.Command(0xc0000987a0, {0x68fe48?, 0xc0000a2000})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/commands.go:467 +0xd7
github.com/go-redis/redis/v8.(*ClusterClient).cmdsInfo(0xc0000e8000, {0x68fe48, 0xc0000a2000})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/cluster.go:1602 +0x13b
github.com/go-redis/redis/v8.(*cmdsInfoCache).Get.func1()
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/command.go:3341 +0x3f
github.com/go-redis/redis/v8/internal.(*Once).Do(0xc0000d0168, 0xc00009edd0?)
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/internal/once.go:54 +0xaa
github.com/go-redis/redis/v8.(*cmdsInfoCache).Get(0xc0000d0160, {0x68fe48?, 0xc0000a2000?})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/command.go:3340 +0x5d
github.com/go-redis/redis/v8.(*ClusterClient).cmdInfo(0xc0000e8000, {0x62fc63, 0x3})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/cluster.go:1618 +0x47
github.com/go-redis/redis/v8.(*ClusterClient).process(0xc0000e8000, {0x68fe80?, 0xc0000b2180}, {0x690f10, 0xc0000b21e0})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/cluster.go:772 +0x75
github.com/go-redis/redis/v8.hooks.process({{0x0?, 0xc000098760?, 0x7ff15901dc40?}}, {0x68fe80?, 0xc0000b2180?}, {0x690f10, 0xc0000b21e0}, 0x60?)
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/redis.go:54 +0x7a
github.com/go-redis/redis/v8.(*ClusterClient).Process(0xc0000cfeb0?, {0x68fe80?, 0xc0000b2180?}, {0x690f10?, 0xc0000b21e0?})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/cluster.go:768 +0x6d
github.com/go-redis/redis/v8.cmdable.Get(0xc000098740, {0x68fe80?, 0xc0000b2180}, {0x630463, 0x7})
/home/jsha/boulder/vendor/github.com/go-redis/redis/v8/commands.go:786 +0x122
main.main()
/home/jsha/boulder/redis-lookup/main.go:23 +0x10c
@jsha having same issue with v8 client getting context deadline exceeded Any luck with your issue? i did check redis docs and found this https://blog.uptrace.dev/posts/go-context-timeout.html even with individual ctx no luck
@jsha mine got resolved when I got the URL and port corrected from localhost to use 0.0.0.0 in case any one stumbles the same could try this fix.
well you should try Ping() first before trying to do operations on redis. after i setup the client, i usually try to ping the redis connection, if failed, then i send panic().