connectedhomeip
connectedhomeip copied to clipboard
current MRP IDLE retry interval default value (5000ms) does not meet the latest SPEC definition (300ms)
Problem
Accessory devices are sharing MRP configuration during mDNS query and PASE/CASE creation. Idle retry interval = 5000ms
Thread network heavily relies on Application retries for congestion environments, this creates a bad customer experience because any application retry would take >5s.
The default setting needs to be 300ms so all Thread REED, Leader, Routers, FTD CHILDs will use it. Any change for sleepy devices should be extra modification product-specific.
Proposed Solution
Change the default to match spec default = 300ms.
src/messaging/ReliableMessageProtocolConfig.h
#ifndef CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL #define CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL (300_ms32) #endif // CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL
@turon
CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL is actually setting the value a node will advertise for itself. This, by design, can be a value appropriate for the node. Interestingly, this value varies widely by platform currently:
$ git grep -e define.*MRP.*INTERVAL
examples/chef/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/light-switch-app/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/lighting-app/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/lock-app/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/ota-provider-app/linux/include/CHIPProjectAppConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/thermostat/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/window-app/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
src/messaging/ReliableMessageProtocolConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (300_ms32)
src/messaging/ReliableMessageProtocolConfig.h:#define CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL (5000_ms32)
src/platform/Darwin/CHIPPlatformConfig.h:#define CHIP_CONFIG_MRP_DEFAULT_INITIAL_RETRY_INTERVAL (15000)
src/platform/Darwin/CHIPPlatformConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
The default value a sender will use to communicate with a node when the parameters are omitted from dns-sd / case / pase discovery are set as follows:
ReliableMessageProtocolConfig GetDefaultMRPConfig()
{
// Default MRP intervals are defined in spec <2.11.3. Parameters and Constants>
static constexpr const System::Clock::Milliseconds32 idleRetransTimeout = 4000_ms32;
static constexpr const System::Clock::Milliseconds32 activeRetransTimeout = 300_ms32;
return ReliableMessageProtocolConfig(idleRetransTimeout, activeRetransTimeout);
}
Because a sender may choose a default timeout that exceeds the specification, we plan to NOT FIX this for v1.0. The current timeout is accounting for delays to the initial message beyond MRP:
- Platforms that send the standalone ack later than the mandated 200ms (due to imperfect concurrency implementations or slow SoC)
- Cases where DNS-SD address discovery is required before the initial message can be sent.
@turon , I agree, the CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL is the value that the device advertises for itself int he mDNS service as SII. This is the value that other devices will use prior to trigger a MRP retry. So consider the the following scenario:
- matter controller is paired to a light-bulb. Both have a valid CASE session. light-bulb is advertising SII=5000
- light-bulb gets power-cycled by the user (an easy mistake), the CASE session is erased in the light-bulb.
- matter controller which still has a valid CASE session, attempts to toggle the light-bulb. The 3 retries will fail ~15 to 20 seconds. Now the controller knows the CASE session is broken, and attempt to fix it.
- over all latency will be >20s. which is unacceptable for user experience.
We need MRP SII = 300 as default for matter 1.0 targeting all mains-powered (non-sleepy) devices.
CC: @chrisdecenzo
what would be the problem to use different default values for PASE, CASE and mDNS-SD ?
- if PASE happens off-network (over BLE), the accessory device is in commissioning mode, which is a non-sleepy mode.
- mDNS-SD is not an issue because Thread devices advertise using the BR as proxy.
- why is CASE session using different setting than SII ? at the time CASE session is required, the device is in-network and we discovered the IP using mDNS-sd, which includes SII.
non-sleepy devices (e.g. light-bulb, smart-plugs,..) should advertise SII = 300 as described in the spec.
#if CHIP_DEVICE_CONFIG_ENABLE_SED
#define CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL <polling period>
#else
#define CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL 300
#endif
@turon , do you support this change?
@AlanLCollins the CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL
is a product-specific decision. I agree that SEDs should use their product-specific polling interval, and light bulbs should use a low latency value such as 300. As such, the default in the SDK shouldn't matter much because each product will tune that value to one that makes sense for them.
@turon , sdk default matters because developers that are not using SED are not paying attention to changing the value. The SDK should either use low latency default value or force developers to change it.
This was fixed in https://github.com/project-chip/connectedhomeip/pull/23344