redis-async-rs icon indicating copy to clipboard operation
redis-async-rs copied to clipboard

Reconnect pubsub

Open elwerene opened this issue 6 years ago • 4 comments

Should pubsub reconnect? I tried, but even after a minute it didn't try to reconnect. I assume it should reconnect after 30 seconds?

Log:

DEBUG  dropping I/O source: 0

elwerene avatar Dec 20 '18 16:12 elwerene

Hello,

Yes, PubSub connections will reconnect, but any subscription streams you have will error and you'll need to resubscribe.

The reason for this is trying to mirror the low-level Redis functionality, and since the Redis server manages the subscriptions, a dropped connection means re-subscribing to the topics.

In a non-open-source project I've been working on, we do just that. We use PubSub subscriptions for notification purposes, where higher-level code tries to re-subscribe until it works. For example, if the Redis server disappears and takes a second or to re-appear, it will try and re-subscribe on regular intervals. But this code isn't built-in to this library as I didn't want to presume high-level behaviours that clients may require.

It might be clearer if I explain how reconnects work on the regular PairedConnection, then explain the difference to PubSub because the underlying mechanism is essentially the same:

A PairedConnection will reconnect if the underlying connection drops, but will only do so when the application tries to use it. This means that the first call on a broken connection will error, but subsequent calls (assuming the reconnection attempt works) will succeed. So an application can decide what to do: if it's using Redis as a cache, then it might decide to not bother retrying; on the other hand, if it's important, the application should retry or return its own error. It doesn't wait for the connection to be established.

The PubSub connection is similar, in that it needs the application to attempt to re-subscribe to topics in order to re-establish the underlying connection. Which means applications need to retry to re-subscribe until it works, or fail after a number of attempts if reconnection attempts fail to indicate an upstream problem.

Short answer: it will reconnect, but it needs the application to do things to trigger the reconnection.

Does this help with the issue you're seeing?

One thing I've been thinking about is, for PubSub connections specifically, implementing within the library similar functionality to the code that already exists in the non-open-source project I've been working on. The way this works is by, for each subscription created by calling .subscribe(topic) on the PubSubConnection, returning a stream that instead of being a stream of type T is a stream of type SubValue<T> where Subvalue is an enum Subvalue::Connected(T) vs. Subvalue::NotConnected - there will be one of the latter for each attempt at reconnecting. The reason NotConnected needs to be surfaced is because, messages on the topic may have been missed while a reconnection attempt happens, this way the application knows that there was a period where messages on the given topic would not have got through and can error instead if needed.

This can be implemented by tokio::spawning a future to do the resubscribing I described above, then connecting the low-level stream to the high-level stream. Applications will just have one stream to deal with and don't need to worry about re-subscribing or re-connections.

The downside is this moves the library further away from being a representative low-level Redis client. The upside is the Pub/Sub logic is pretty high-level anyway, and adding re-subscription would make other things easier.

benashford avatar Dec 20 '18 21:12 benashford

Wow, thanks for your thorough answer!

How can I know, that a stream is not connected anymore to return a NotConnected? I started with your example and did never get an error, when redis was running on launch and then stopped.

elwerene avatar Dec 20 '18 22:12 elwerene

The current stream returned by future returned from subscribe on the PubSubConnection should end in one of two ways, either:

  • it is dropped by the application when its no-longer needed, in which case it will unsubscribe itself.
  • or it errors. At present the error type is () which isn't particularly helpful, but is done with the intention of avoiding any leaky abstractions.

It is, however, safe to assume that any stream that errors needs to resubscribe. The call to subscribe would produce an error rather than a stream if anything else was the problem. If this changes in any future versions I'd consider it a breaking change, and would almost certainly change the type of the stream that would make it obvious such a change had happened.

However... at present... I think it might be possible for the stream to end naturally, when the connection is dropped at the server-side which is not ideal. If this happens, and we didn't call unsubscribe, we should try and reconnect too. But this is confusing behaviour and we should consider it a bug.

You can probably work around it by attempting to reconnect after any unexpected end to the stream. But we should leave this issue open in the meantime anyway, until I make sure that the stream actually errors properly.

benashford avatar Dec 20 '18 23:12 benashford

The most recent 0.5.0 release moves this issue along, but doesn't fully resolve it.

The pattern that a client app should always re-subscribe to a topic if it unexpectedly ends continues, but if that reconnect fails the client app will get a more concrete error message.

It's still an open issue how best to communicate with subscribers as to why the stream has failed.

benashford avatar Jul 19 '19 10:07 benashford