twemproxy
twemproxy copied to clipboard
PubSub in Redis - any plans?
Hi,
do you plan providing pubsub feature for redis backend? I would like to use twemproxy but this feature is crucial for me.
BTW thanks for great project.
@mthenw could you describe the scenario in detail that you are attempting to solve with a pubsub support in twemproxy
I'm using redis as a queue with persistence and I would like to scale (shard) it easily without using language dependent client libraries. On one side of queue I produce messages (multiple instances) by publishing on channel and pushing to list (history). On the second side I have consumers that process some of them and, from time to time, read history of them (from list).
Just to be clear, do I understand correctly that you are publishing messages by announcing them in a pub channel, and then adding them to a single redis list?
If so, twemproxy will not scale that for you with or without pub sub. All your items in a single key will still reside on a single shard. Twemproxy partitions keys based on their name, not their contents.
You could possibly change to storing a key/value for each item, then "announce" it by adding it to the left side of a list, while your consumer pop from the right side. The single list would of course reside in a single key, but the data themselves could be partitioned resulting in them being split among the N nodes in the cluster.
Ultimately, the question to be asked is how much data you are trying to solve for. Twemproxy or even Redis may not be the right tool for the job.
On Jul 30, 2013, at 16:14, Maciej Winnicki [email protected] wrote:
I'm using redis as a queue with persistence and I would like to scale (shard) it easily without using language dependent client libraries. On one side of queue I produce messages (multiple instances) by publishing on channel and pushing to list (history). On the second side I have consumers that process some of them and, from time to time, read history of them (from list).
— Reply to this email directly or view it on GitHub.
I wasn't clear enough :) I'm completely aware that I need to use multiple lists and multiple channels if I want to scale it among multiple servers. I also know that I could use other tools (probably RabbitMQ, Apache Kafka) but I would like to do that with Redis.
I've thought about this a little bit, and here's more or less the reasoning/conclusions I came to.
SUBSCRIBE requires an open active connection to redis to get channel messages. This sort of breaks twemproxy's "one connection per redis instance" groove.
You can get around this (sort of), by having twemproxy PSUBSCRIBE * to each of it's connected instances, and manage subscriptions itself locally based off of parsing incoming messages.
From here, it's pretty straightforward, you can route PUBLISH messages to their respective nodes using the usual hashing method. twemproxy could look at received messages and then route them to the correct subscribers.
However, if you step back and think about it a bit, this isn't really saving you anything. You're introducing a lot of overhead to pretty much just spread out which redis instances are dealing with the underlying channels, but those underlying channels are just pointed back at twemproxy, which would then handle directing it to the right subscribers -- it's all just overhead, at least with my naive implementation. You'd be better off just having another redis instance spun up whose sole job it is to deal with pub/sub.
It's a little less elegant, but it's easy.
Well, we have a setup where each application server (meaning a server that runs many instances of our application) has it's own, local, twemproxy instance that proxies to a shared pool of Redis servers. With pubsub support in twemproxy as you described we could easily leverage Redis' pubsub functionality for some lightweight message passing in a highly available manner. Setting up a separate Redis instance would introduce new availability problems that twemproxy solves for us, currently (unless we setup Redis cluster of course).
If it doesn't impact twemproxy performance for regular key/value traffic it would be a nice-to-have feature.
Variables
S = Number of subscribing connections P = Number of published messages N = Number of twemproxy/Nutcracker instances R = Number of redis instances A = Number of application servers F = Number of redis instance up/down notifications
Single Redis Instance Overhead
If you use a single redis instance for all of your pub/sub notifications, you could think of the performance overhead as O(S*P). That obviously isn't great, but it's honestly not much different than the Naive pub/sub on twemproxy I described.
Naive Pub/Sub Overhead
The naive implementation I proposed doesn't scale well either. The overhead on twemproxy is O(R+S/N_P). That isn't great because you still have to handle every published message. The overhead on each redis instance is great O(S/R_P/R), but adding a new redis instance gets you nothing here for scalability because twemproxy will be the clear bottleneck (in fact it adds more load to each twemproxy instance).
Depending on your pub/sub load, this could have a significant impact on twemproxy's performance. Hashing and handling requests already puts twemproxy at a higher CPU load than redis instances even with a single instance connected. My best guess is that the load from handling a published message per subscriber will be about the same as handling a key hash.
Due to these factors, I think it's unlikely this naive implementation of Pub/Sub would make it upstream. I'd still argue that this is worse than the single redis instance approach because of the rather excessive overhead I believe will be added to each twemproxy instance. You do get high availability, but it's at a great cost.
A More Scalable Approach
A more scalable solution would be to add a subscribe functionality to twemproxy to get node up/down notifications, and to be able to ask twemproxy for what redis instance to connect to based on a key. Then if/when that connection dies or comes back to life, you would get a notification and could invalidate and re-ask Nutcracker for a new one.
That way twemproxy has a subscriber per application server connected to it (in your case 1) giving it O(A/N_F) (which should be essentially no overhead because F will be very small), and each redis instance would again have O(S/R_P/R) overhead, which scales perfectly.
The downside here is that it necessitates a client-side implementation to handle up/downed node reconnection and makes it so twemproxy is less clear about it's drop-in replacement capability. I don't think this implementation would ever make it upsteam into twemproxy because of that.
The last approach (scalable approach) proposed by @tejacques makes sense and might become a viable solution given the constraints of pub-sub. Anyone wants to take a stab at solving pub-sub in twemproxy as a fun experimental project? :) I can create a branch twemproxy_pubsub for this experimental feature.
Any update on this front?
Well, I also need this as @mthenw, coz of Spring Session Redis.The Session Event feture requires PUB/SUB function from redis.
I believe someone resurrected this thread then deleted their comment.
For the benefit of anyone running into this in the future, Redis cluster with the stream data type works the way I described earlier in the thread, see the following for additional information:
https://github.com/antirez/redis/issues/2672 https://redis.io/topics/streams-intro