sonic-swss-common icon indicating copy to clipboard operation
sonic-swss-common copied to clipboard

Refine publish notifications

Open qiluo-msft opened this issue 6 years ago • 2 comments

This is a competing PR to https://github.com/Azure/sonic-swss-common/pull/258

I reverted https://github.com/Azure/sonic-swss-common/pull/214 since it exposed unnecessary implementation details.

Problem statement is same as https://github.com/Azure/sonic-swss-common/pull/258, so I shamelessly copied it here.

The typical use case that this PR is trying to deal with is related to route flapping.

Assuming such a scenario: <1> 4K routes learned on an interface.

<2> interface down, 4k routes withdraw. <3> fpmsyncd inserts 4k route keys into producerStateTabk keyset and generates 4k redis publish notifications

<4> Interface goes up

<4.A> Interface may go up and learn the 4k routes back before orchagent has chance to pick up any notification and process the keyset. That is fine, change in https://github.com/Azure/sonic-swss-common/pull/257 handled this case, no new redis publish notification will be created since the route keys already exist in producerStateTabk keyset and are to be processed.

<4.B> Interface may go up and learn the 4k routes back after orchagent has processed a few notifications but not all the redis notifications (say, 1k notification processed), with the first notification, consumer in orchagent batch popped all the 4k route keys from producerStateTabk keyset, (orchagent has set popBatchSize as 8192), for the other 999 notifications processed, orchagent just call consumer_state_table_pops.lua 999 times without seeing any real data. 4k - 1k = 3k more redis notifications pending in redis queue now.

<5> Interface goes down <5.A> If <4.A> was hit during previous interface up, good, no new redis publish notification will be generated

<5.B> If <4.B> was hit during previous interface up. 4K more redis notifications put into the redis queue. <6> Interface goes up, loop back to <4>

The exact scenarios vary, but from <4> to <6>, number of pending notifications keeps increasing.

The issue is that orchagent may be abled to handle all the route changes with the batch processing (popBatchSize = 8192), redis publish notification was generated for each key and orchagent has to respond to each of the notifications with current select framework though most of them bring no value.

The implementation proposed here follows some design principles.

  1. Generate one and only one Redis publish notification (channel message) when there is one data in the key set. So the number of messages and keys are the same, and we can ensure to decrease the same amount during pop/pops()
  2. The only exception to above is when producer calls clear(). The channel messages could not be recalled.
  3. Some users will prefer pop(), and others may prefer pops() to pop a batch. But no one will use both for the same consumer table
  4. Consumer will never publish to channel, ie. create channel message

qiluo-msft avatar Mar 10 '19 08:03 qiluo-msft

@jipanyang @zhenggen-xu As we offline discussed I submit this PR. Please help review. Thanks for your solution and inspiration. #Closed

qiluo-msft avatar Mar 10 '19 08:03 qiluo-msft

retest this please

lguohan avatar May 24 '19 08:05 lguohan