ot-br-posix The MLR update max delay should depend on the network scale

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is.

I faced the problem with testing multicast traffic on the Thread devices. If I first boot the OTBR, then attach Thread child, register multicast group on a child and try to send multicast traffic to it from PC it is working fine. But after I reboot the OTBR the multicast communication stops working. The reason I investigated is that OTBR re-registers multicast table depending on the BBR dataset parameter called Reregistration delay. By default it's set to 1200 s, so the OTBR draws random reregistration time from 0 to 1200 s range. I can understand it considers big networks and collision avoidance, but for example in my network I have only 1 OTBR and 1 Thread child, so it doesn't seem to be reasonable to wait like e.g. 15 minutes to re-register it. After that time multicast communication starts working again.

Describe the solution you'd like A clear and concise description of what you want to happen.

I think it would be convenient to calculate this max delay time considering the network size, that can be counted as e.g. number of Thread devices belonging to OTBR network.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

For now I made workaround in my setup by invoking bbr config delay 30 to lower the max range and provide better user experience in case of OTBR reboot.

Additional context Add any other context or screenshots about the feature request here.

Jul 15 '22 08:07 kkasperczyk-no

Interesting, related to #1459.

Jul 15 '22 13:07 dtodor

Interesting, related to #1459.

@dtodor right, I also spotted this problem while testing Matter multicast traffic.

Jul 18 '22 05:07 kkasperczyk-no

Reregistration delay is used for DUA and MLR reregistration periodically or event triggered, and yes, an appropriate value should be chosen to avoid possible reregistration flooding specially when triggered by network-wide event (e.g. seqno change after BR reboot)

1200s is a default constant value in OT, however Thread Specification relies on vendor to choose an appropriate value

// Thread 1.3.0 Specification 5.21.3.3
The Reregistration Delay value is vendor-specific and can be chosen by the BBR either as a fixed (configured) value or a dynamic value based on detected network size or circumstances

Here are some possible options I can see to improve the experience

update OT to allow configuring re-registration delay when building, and BR vendors can choose an appropriate value with awareness of possible side effect of their choice (e.g. possible traffic flooding in large scale network if too short)
add dynamic re-registration delay update mechanism according to network scale
introduce some experimental optimization feature for 1.2/1.3 Router to re-register MLR when event-triggered and use a smaller random range (may adaptive to current active router number) if the re-registration delay is big and device vendors can't tolerate for multicast communication. Different from DUA - which is unique per device, for MLR, it can be assumed that a batch of devices will subscribe same multicast address generally, and the 1.2+ Router will aggregate the multicast addresses registered by it's children and incorporate multiple addresses in one MLR.req, so MLR.req traffic should be much less than DUA.req when responding to network-wide event.

Personally I think 1) and 3) might be doable now and may consider 2) in the future.

Thoughts? @jwhui @simonlingoogle

Jul 21 '22 07:07 librasungirl

@librasungirl @jwhui I think option 1 should be the simplest way to go. Configuring it to be like 10 seconds should be more suitable for most home networks.

For option 2, I don't think the reregistration delay should be proportional to the number of nodes in the network. It really depends on the number of nodes in the channel collision range, which is an information not available in OpenThread.

Aug 09 '22 08:08 simonlingoogle

Resolved by https://github.com/openthread/openthread/pull/7996

Aug 25 '22 23:08 jwhui

ot-br-posix ot-br-posix copied to clipboard

The MLR update max delay should depend on the network scale

ot-br-posix
ot-br-posix copied to clipboard