go-redis icon indicating copy to clipboard operation
go-redis copied to clipboard

Context cancel not working on blocking operations in v9

Open FLAGLORD opened this issue 2 years ago • 23 comments

I noticed the https://github.com/redis/go-redis/issues/1196#issue-527462854, and it seems that maintainers have fixed this.

But i use the alomost the same code given in the https://github.com/redis/go-redis/issues/1196#issue-527462854:

client := redis.NewClient(&redis.Options{
		Addr: "localhost:6379",
	})

cancelCtx, cancelFn := context.WithCancel(context.Background())

wg := &sync.WaitGroup{}
wg.Add(1)

go func() {
	client.BRPop(cancelCtx, 0, "")
	wg.Done()
}()

time.Sleep(2 * time.Second)

cancelFn()
wg.Wait()

It still waits indefinitely due to the 0 timeout. I have no idea...

FLAGLORD avatar Apr 23 '23 04:04 FLAGLORD

Is it related to https://github.com/redis/go-redis/issues/2276 ?

FLAGLORD avatar Apr 23 '23 04:04 FLAGLORD

We made an attempt in #2455, to be honest, the experience was poor. It disrupted our usual process of executing commands. In my opinion, we need to put more thought into it.

monkey92t avatar Apr 23 '23 13:04 monkey92t

I feel that the cost of #2455 is too high, and I don't think it's a good solution, even though it solves the problem.

monkey92t avatar Apr 23 '23 13:04 monkey92t

Is there a reasonable solution to this problem other than setting a low timeout and checking the value of context manually? Seems like we don't have a choice but to poll which defeats the purpose of the blocking pop. It also means I have to wait up to a second for it to respond to a context being canceled.

Anyone have a nicer solution?

var value string
for {
	select {
	case <-ctx.Done():
		return nil, context.Canceled
	default:
	}

	result, err := client.BLPop(ctx, time.Second, key).Result()
	if err == nil {
		value = result[1]
		break
	}
}

agorman avatar Apr 24 '23 19:04 agorman

Is there a reasonable solution to this problem other than setting a low timeout and checking the value of context manually? Seems like we don't have a choice but to poll which defeats the purpose of the blocking pop. It also means I have to wait up to a second for it to respond to a context being canceled.

Anyone have a nicer solution?

var value string
for {
	select {
	case <-ctx.Done():
		return nil, context.Canceled
	default:
	}

	result, err := client.BLPop(ctx, time.Second, key).Result()
	if err == nil {
		value = result[1]
		break
	}
}

If you hope that it returns immediately when being canceled, code snippet below works.

var value string
doneChan := make(chan struct{})

go func() {
	for {
		select {
		case <-ctx.Done():
			return
		default:
		}
		result, err := client.BLPop(ctx, time.Second, key).Result()
		if err == nil {
			value = result[1]
			// Use select to avoid goroutine leak
			select {
			case doneChan <- struct{}{}:
			default:
			}
			return
		}
	}
}()

select {
case <-ctx.Done():
	return nil, context.Canceled
case <-doneChan:
}

However, it seems weird and tricky. I do not like it.

FLAGLORD avatar Apr 25 '23 12:04 FLAGLORD

Sadly my code snippet has side effects: If the program successfully read result from redis, but it hasn't notified by writing to donnChan. At this time, ctx is canceled. It will return context.canceled but has retrieved value from redis...

FLAGLORD avatar Apr 25 '23 12:04 FLAGLORD

It seems impossible that there could be a solution that has no side effects and responds to context cancel quickly. Considering that waiting for cancel and BLPop are independent from each other:

  • If the program responds to cancel immediately, we have to use a goroutine to do BLPop, and a cancel can be sent while blocking(it means side effects);
  • If we hope to avoid dropping BLPop results, we have to block in the main process and it means that we have to tolerate a delay in context cancel.

Currently we have to trade off according to our needs

FLAGLORD avatar Apr 25 '23 14:04 FLAGLORD

Because conn.Read(), conn.Write() do not depend on ctx, but only use timeout to control.

monkey92t avatar Apr 25 '23 14:04 monkey92t

We still want go-redis to maintain high performance and do not want to use multiple goroutines to pass signals through channels, which would make it more complex. We want the API to be simple like this:

client.Get() -> write cmd -> read response data.

monkey92t avatar Apr 25 '23 14:04 monkey92t

Because conn.Read(), conn.Write() do not depend on ctx, but only use timeout to control.

You could simply using context.AfterFunc to set the deadline:

done := make(chan struct{})
stop := context.AfterFunc(ctx, func() {
	conn.SetReadDeadline(time.Now())
	close(done)
})
n, err = conn.Read(b)
if !stop() {
	<-done // Wait for AfterFunc to complete.
	conn.SetReadDeadline(time.Time{})
	// Return `conn` to pool.
	return n, ctx.Err()
}

This might not work for calls with side-effects but would work nicely for things like XRead and it's a lot more lightweight than the solution in the PR above.

It could also be further optimised by using a sync.Pool for for the channels that are created for waiting.

abemedia avatar Aug 27 '23 00:08 abemedia

(Talking about state at v9.4.0)

We're also affected by the lack context cancellation awareness in our use-case: Long running BLPOP & PSUBSCRIBE calls. There are obviously workarounds - but they're not ideal:

  1. For PSUBSCRIBE, use Channel instead of ReceiveMessage/Receive
    1. :green_circle: The channel responds to PubSub.Close()
    2. :red_circle: Two extra routines are used: initHealthCheck & init(All|Msg)Chan
      • The health check is obviously of use if one wants automatic re-connections (but this retry behaviour is only available for pubsub, not other blocking commands)
  2. For BLPOP (and any other blocking commands which support a Redis command timeout parameter) specify a short timeout
    1. :green_circle: If context is cancelled, the connection continue to block only up to the configured timeout.
    2. :red_circle: The connection still blocks until the timeout
    3. :red_circle: A low timeout increases unnecessary repeated polling and extra application logic (a while-true loop is needed)

We still want go-redis to maintain high performance and do not want to use multiple go-routines to pass signals through channels, which would make it more complex. We want the API to be simple like this:

client.Get() -> write cmd -> read response data.

@monkey92t, how about exposing an optional hook to allow users, who wish to have fully context-aware behaviour, to choose to implement the "set deadline on context end" in one of the multiple ways they see fit, e.g.:

  1. With go 1.21+, as suggested above, using context.AfterFunc()
  2. Using a separate routine to await context to complete
  3. Some other approach

This should avoid having the authors/maintainers having to reason about increased complexity and/or reduced performance in the library. (E.g. like so, maybe in conjunction with pubsub context awareness)

vtermanis avatar Jan 17 '24 16:01 vtermanis

@monkey92t I can definitely understand a desire to keep the API simpler, and performant. However this is a go-specific project, is it not? By not using the context deadlines you are going against the grain. This makes the calling code more complex, or at minimum, less idiomatic (goth) and ergo with more mental overhead for your users. The tradeoff is not a good one IMO. Esp when the API for BLMove() (eg) still takes a ctx arg but ignores it. I might go so far as to call this API hostile.

I'm not following along very closely, so excuse any misunderstandings on my part.

notrobpike avatar Feb 08 '24 02:02 notrobpike

I just ran into the same issue with a blocking XREADGROUP command. After reading through this issue, I wasn't quite satisfied with the workarounds offered here. After further investigation and a little digging into the source of this library, I resorted to a solution that wasn't yet suggested here or elsewhere. I'll outline the idea below:

I resorted to use a dedicated connection for blocking commands and use the connection ID to unblock the client whenever my context closes.

Code example:

var rdb *redis.Client

ctx, cancel := context.WithTimeout(context.Background(), 2 * time.Second)
defer cancel()

conn := rdb.Conn()
connectionID := conn.ClientID(ctx).Val()

stop := ctx.AfterFunc(ctx, func()  {
    // uses one of the pooled connections of the redis client to unblock the blocking connection
    rdb.ClientUnblock(context.Background(), connectionID)
})
defer stop()

// imagine this to be run in a for{} loop which is left once ctx.Done() is closed.
// to check if the channel was closed, XReadGroup must return prematurely if that happens
result, err := conn.XReadGroup(ctx, &redis.XReadGroupArgs{
    Group:    "mygroup",
    Consumer: "myclient",
    Streams:  []string{"mystream", ">"},
    Block:    10 * time.Second,
}).Result()

// returns immediately when the group has new messages
// also returns immediately when ctx.Done() is closed (because of the issued `CLIENT UNBLOCK` command)
// blocks otherwise until timeout (10s) is reached or new messages arrive

schnz avatar Apr 03 '24 12:04 schnz

In my case, all I wanted from a canceled context was graceful termination. So I worked around this problem with the following:

go func() {
  <-ctx.Done()
  rdcli.Close()
}()

This rudely closes the underlying net Conn and causes something like BLMove to return err set to context.Canceled.

It remains possible that the blocking operation had just completed its operation on the redis server and the result was on its way to be processed but the socket was shut and the response was lost. Since I've designed protections for abnormal process termination already, this was not an issue for me.

jimsnab avatar May 16 '24 18:05 jimsnab