fluvio icon indicating copy to clipboard operation
fluvio copied to clipboard

Storage: Segment validation should not block SC Dispatcher Loop

Open sehz opened this issue 2 years ago • 0 comments

For large data sets, there could be many segments. This run generates many segments after a few hours of run:

nohup ./target/release/fluvio-test longevity --timeout 90000  -- --runtime-seconds=72000 --producers 300 --consumers 0 > /tmp/test.out 2> /tmp/test.err &

The result generates ~200 segments. When SPU restarts, it tries to load, unfortunately, it tries to load segment as part of SC dispatch loop:


2022-01-25T22:57:09.892085Z DEBUG sc_dispatch_loop{socket=10}:handle_update_replica_request{sc_sink=fd(10)}:apply_replica_actions{actions=1}:add_leader_replica{replica=longevity-0}:create_or_load: fluvio_storage::segments: reading segments at: ReadDir(
    "/Users/sehyo/.fluvio/data/spu-logs-5001/longevity-0",

Based on data and the number of segments, this could take a few minutes. During that time, the Consumer might report an invalid error message:

$ fluvio  consume  longevity --tail 10 
Consuming records starting 10 from the end of topic 'longevity'
Error: 
   0: Dataplane error: the given SPU is not the leader for the partition
   1: the given SPU is not the leader for the partition

Actions to take:

  • Run validation loop in separate background loop.
  • While loop is running, mark Replica as "busy" for consumer and producer.

sehz avatar Jan 25 '22 23:01 sehz