rafka icon indicating copy to clipboard operation
rafka copied to clipboard

Increased memory usage with many short-lived producers

Open agis opened this issue 7 years ago • 2 comments

We have a pathological case where RSS can skyrocket: large number of short-lived producers.

Such a case is typical with some kind of forking client. resque for example, spawns a process per job and kills it after the job is done. Assuming a short-lived job that also produces 1 message to Rafka, we may end up with hundreds or even thousand of producers that are spawned only to produce a single message and die afterwards.

In the meanwhile, confluent-kafka-go producers are costly since each of them pre-allocates two 1M-buffers:

  • the p.events channel, accessible via p.Events()
  • the p.produceChannel channel, accessible via p.ProduceChannel()

The situation gets even worse cause of https://github.com/golang/go/issues/16930.

Proposal

This could be fixed by re-architecting Rafka to have a N:M model (N=client producers, M=librdkafka producers), but that would require significant changes and would make Rafka usage more complex. We want to keep the 1:1 model if possible because it is simple.

However, we can remedy the issue in some ways:

  • [x] set the buffer size of produceChannel to 0: This channel is completely unnecessary since we use the function-based producer (https://github.com/skroutz/rafka/commit/6ef4bf248be773945135cf184d20ea47426413e8)
  • [x] decrease the buffer of the events channel
    • [x] we can set this to a much more sensible default for our use-case. For this, we submitted https://github.com/confluentinc/confluent-kafka-go/pull/90, which is now merged
    • [ ] ~~Allow clients to control their producer configuration: https://github.com/skroutz/rafka/issues/40~~ (defered)

We should also state in the README that Rafka, like librdkafka itself is optimized for few long-lived producers instead of a bursty usage patterns (ie. many short-lived producers).

agis avatar Aug 25 '17 12:08 agis

https://github.com/golang/go/issues/16930 is now closed. We should verify that rafka built with Go 1.13 no longer suffers from this issue, and close this.

Also relevant: https://github.com/golang/go/issues/30333.

agis avatar May 23 '19 09:05 agis

After testing with go tip (65ef999), the situation is pretty much the same as before. So I'm leaving this open as a known issue.

agis avatar May 23 '19 12:05 agis