beast icon indicating copy to clipboard operation
beast copied to clipboard

Address bottleneck on offset committer thread

Open mauliksoneji opened this issue 4 years ago • 0 comments

Problem Currently, there is only one offset committer thread that acknowledges the successful consumption back to Kafka. As per beast architecture, the Consumer, BQ Workers, and Acknowledger threads work independently and are connected by blocking queues.

The push operation on blocking queues which put the Kafka messages to the queue is not indefinitely blocking, instead, there is a timeout specified for getting a free slot on the queue to push the batch of Kafka messages.

Since we can spawn any number of BQ Workers, the Commit Queue processed by Acknowledger gets full and even with sufficiently high timeouts, the commit queue stays full because of the high load of messages on Acknowledger.

We require a mechanism to increase the processing capacity of Acknowledger thread so that it doesn't become the bottleneck for the application.

Approaches

  1. Wait indefinitely for adding batch to commit queue Currently, we are only waiting for some time to get a slot in commit queue, if the queue is still full then, the process exits. One idea is that we can wait indefinitely to push data to the commit queue. By doing this, even though the queue gets full, process doesn't restart.

Disadvantages: We push data in the queue in a synchronous fashion. So if the push to commit queue takes long time, we are bottle necked on this and we are essentially using only one thread to push data to bigquery. This results in big performance degradation and diverges from beast philosophy of scaling.

  1. Batch commits Currently, we are sending one acknowledgement per batch. The idea is to club the acknowledgements for a certain period of time and then send an acknowledgement.

With this approach of batch commit, We need to make sure that there is no data loss.

mauliksoneji avatar Apr 06 '20 06:04 mauliksoneji