rafka
rafka copied to clipboard
Increased memory usage with many short-lived producers
We have a pathological case where RSS can skyrocket: large number of short-lived producers.
Such a case is typical with some kind of forking client. resque for example, spawns a process per job and kills it after the job is done. Assuming a short-lived job that also produces 1 message to Rafka, we may end up with hundreds or even thousand of producers that are spawned only to produce a single message and die afterwards.
In the meanwhile, confluent-kafka-go producers are costly since each of them pre-allocates two 1M-buffers:
- the
p.events
channel, accessible viap.Events()
- the
p.produceChannel
channel, accessible viap.ProduceChannel()
The situation gets even worse cause of https://github.com/golang/go/issues/16930.
Proposal
This could be fixed by re-architecting Rafka to have a N:M model (N=client producers, M=librdkafka producers), but that would require significant changes and would make Rafka usage more complex. We want to keep the 1:1 model if possible because it is simple.
However, we can remedy the issue in some ways:
- [x] set the buffer size of
produceChannel
to 0: This channel is completely unnecessary since we use the function-based producer (https://github.com/skroutz/rafka/commit/6ef4bf248be773945135cf184d20ea47426413e8) - [x] decrease the buffer of the
events
channel- [x] we can set this to a much more sensible default for our use-case. For this, we submitted https://github.com/confluentinc/confluent-kafka-go/pull/90, which is now merged
- [ ] ~~Allow clients to control their producer configuration: https://github.com/skroutz/rafka/issues/40~~ (defered)
We should also state in the README that Rafka, like librdkafka itself is optimized for few long-lived producers instead of a bursty usage patterns (ie. many short-lived producers).
https://github.com/golang/go/issues/16930 is now closed. We should verify that rafka built with Go 1.13 no longer suffers from this issue, and close this.
Also relevant: https://github.com/golang/go/issues/30333.
After testing with go tip (65ef999), the situation is pretty much the same as before. So I'm leaving this open as a known issue.