nsqd: per-topic mem-queue-size
It would be nice to have the ability to set a max queue size per-topic, especially for ephemeral topics that have no channels/consumers.
An even better situation would be for ephemeral topics to not queue messages at all when there are no channels/consumers.
You can set it nsqd wide with --mem-queue-size, not per-topic/channel though.
It is (more) difficult to come up with a reasonable configuration mechanism on a per-topic/channel basis because they are all created at runtime.
there are a number of issues focused on losing more messages faster... perhaps they should be pointed towards some other solution, like udp packets
(is there anything more modern similar to https://github.com/bitly/simplehttp/tree/master/pubsub ?)
lol @ploxiln
@ploxiln Good one. This has less to do with message integrity as much as it has to do with prioritizing memory usage across topics of unequal importance and unequal consumer consumption speed/ratios. Mostly though, I feel as though an emphemeral topic should drop messages all of the time when there are no channels, versus queuing to memory. That way the message quota is not being consumed by something of less importance than another topic processing tens of thousands of messages a second and in the middle of a processing backup; hence the idea of having per-topic quotas.
Also, suggesting the use of multiple pub sub systems to workaround a legitimate use case gap is counter-productive to nsq.
To ease the configuration headache, perhaps you could only expose this functionality through the explicit topic/channel creation endpoints?
So the default for all channels would be the global mem-queue-size, but explicit create_topic or create_channel calls could override it (or fail if the topic/channel already exists)?
Rationale
We just have some topics we'd like to be very careful with (0 or 1 mem-queue-size) and other topics where we'd like to let them be #ephemeral (without instantly dropping messages). I think the best solution for us would be to use nsq_to_file for durability instead of relying on an insanely low mem-queue-size, but that's non-zero operational overhead.
Update: Ah, actually #302 (TTLs) would also accomplish our use case better perhaps. We'd just never use #ephemeral ports and instead use TTLs as kind of a GC.
@schmichael I really like this idea!
There are still some tricky questions to answer around scaling out, i.e. what happens when you add a new nsqd node to the cluster that will "produce" said topic... how do we ensure that it is created with the same properties as the other existing topic (nsqlookupd, and eventually gossip metadata isn't guaranteed to be consistent).
It might be sufficient to require that upon topic creation the nsqd queries all nsqlookupd for metadata and "makes the best choice"?
@mreiferson Perhaps v1 could leave configuration consistency up to users? That's already the case with mem-queue-size, different peers in a cluster could be configured differently and nsqd won't try to stop you.
Seems like a reasonable place to start would be to expect people to treat topic creation w/configuration as part of their operations like config management, such that they create any non-default topics explicitly when provisioning a new node.
I could see v2 of this feature using some sort of reliable broadcast to try to maintain consistent topic configurations, but throw partitions and all the usual consensus issues into the mix and it seems like way too much to get right for an initial release.
(Obviously if I could get enforced topic configuration consistency "For Free" and not have to deal with the operational hassle of manually creating topics that'd be great, but even in our brave new world of high quality raft implementations I can't imagine distributed consensus ever being "free")
@schmichael great points, SGTM
It might be worth mentioning, that (as of the last time I used nsqd very heavily), a mem-queue-size of 0 or 1 for a topic won't really work with the current code. This is because (again, this might be out-of-date) the messages queue in the topic before they're distributed to channels, and it's possible to receive multiple messages on a topic before the message is distributed to the channel queues. So you need a sort of "scheduling buffer" of messages queued in the topic. IIRC
Can someone suggest what could be optimum --mem-queue-size for 8 gb ram server , currently i am using --mem-queue-size = 0 and persisting all the data to disk. Now my disk completely full and disk latency is coming while i have free ram of 6.75 gb always . Can someone me suggest to --mem-queue-size set
It depends on the typical size of your messages. For small messages, the default --mem-queue-size is 10,000 which is a fine choice for many situations.