parallel-consumer icon indicating copy to clipboard operation
parallel-consumer copied to clipboard

docs: Comparison to Spring Cloud Stream Kafka binder

Open astubbs opened this issue 3 years ago • 2 comments

https://github.com/spring-cloud/spring-cloud-stream-binder-kafka

Threading model, offset handling, failure, rebalancing, etc

astubbs avatar Apr 20 '22 14:04 astubbs

Spring Cloud Stream - with Kafka Binder vs Parallel Consumer

Ultimately the two frameworks have different goals.

Spring Cloud Stream aims to abstract message processing from underlying transport in very configuration driven way, but most of Kafka specific concerns are handled by the Spring Cloud Stream Kafka Binder and underlying Spring Kafka frameworks. Without building additional parallelisation layer scalability is limited by number of Kafka Topic partitions.

Parallel Consumer on the other hand is built specifically to ease parallel processing beyond the Topic partition limit - so it is built to handle splitting batch of messages into multiple sub-streams for processing (different strategies for different ordering guarantees), handle back-pressure and out of order offset commit. To the best of my knowledge it does not have any Spring integration built.

Message processing / threading, offset handling, back-pressure control comparison

Spring Cloud Stream with Kafka binder:

  • Processing of messages is synchronous by default - separate consumer thread created up to a maximum specified by 'concurrency' configuration parameter.
  • Offsets committed in sequence - batch (default), each message, manual.
  • Failure handling is configured and handled using combination of Spring Cloud Stream features and Kafka features - Spring Cloud Stream allows configuration of retry handling using combination of Retry Template (specify backoff / retry counts etc), strategy - drop message, DLQ and message channel routing - messages can be routed to in-memory error channels for handling or to transport backed one (similar to DLQ).
  • Back-pressure - there is no back-pressure control - so it is app to application developer to ensure that messages are processed within poll timeout period and / or configure that timeout appropriately or implement Pause/Resume logic to prevent Consumer from timing out.

I am not fully clear how Reactor based processing works between Spring Cloud Stream framework and Kafka binder - there is nothing in Spring Cloud Stream Kafka Binder documentation explaining this - but judging by some of the issues on Spring Cloud Stream repository - acknowledgements need to be performed manually, back-pressure needs to be handled manually etc.

Parallel Consumer:

  • Processing of messages in parallelised manner using either hand off to processing thread pool, Reactor or Vertex frameworks. Messages returned in a batch from single poll / partition are split into sub-streams based on specified strategy.
  • Offsets are committed preserving out of order processed message status - in addition to maximum sequentially processed offset - any offsets processed with gaps in sequence are stored as well - thus allowing to skip already processed messages on re-consumption due to re-balance or restart.
  • Failure handling - configuration of retry / backoff parameters.
  • Back-pressure - handled out of the box - buffering messages between consuming and processing parts with configuration and auto-tuning of buffer size based on rate of processing.

rkolesnev avatar May 17 '22 14:05 rkolesnev

Consider adding to docs or do a comparative blog post cc @rkolesnev

astubbs avatar May 17 '22 16:05 astubbs

Closing Issue

johnbyrnejb avatar Jul 07 '23 14:07 johnbyrnejb