RADAR-Backend
RADAR-Backend copied to clipboard
How to scale up partitions avoiding inconsistency
How to scale up partitions
Current streams implementation is sensitive to partition number changes. It assumes that all messages for a given patient goes through the same partition for all the time.
Kafka spreads messages across partitions computing hash(key) % partition_number
. If at runtime we change the partition_number
(for instance at time t
), then messages previously going to partition x
, at time t+1
may now go to partition y
. These two partitions may be assigned to two different streams. Meaning that: there will be a transient during which two streams concurrently update the hot storage view for the given patient. They will manage two time windows that may overlap, creating an inconsistent data for few seconds.
To scale up/down partitions avoiding this issue, we can set up three layers of topics:
- device topic
- temp topic
- business topic
Level 1 will be scale up/down according to load. A consumer groups will forward messages from level 1 to level 2. Streams will consume level 2. To scale up/down level 2, we need to
- stop the consumer group, between level 1 and 2
- stop streams, between level 2 and 3
- scale up/down level 2
- restart consumer group and streams from the early offset. This change is transparent to the users since, level 1 will be always on-line to serve their requests.
This approach entails a small overhead that should be worth to paying compared with the enhancement.
This issue does not involve the Cold Storage since it consumes the device topic.