firmware icon indicating copy to clipboard operation
firmware copied to clipboard

CLIENT_BASE: Act like ROUTER_LATE for fav'd nodes, instead of like ROUTER

Open korbinianbauer opened this issue 1 month ago • 5 comments

As requested by @NomDeTom on discord in #transport-layer

Hey, could you do a favour here and make the changes to firmware to make client_base act like router_late for favourite nodes, please? We're discussing it on the ⁠contributor-hangout , and it's agreed that this is the correct thing to do.

🤝 Attestations

  • [ ] I have tested that my proposed changes behave as described.
  • [X] I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • [ ] Heltec (Lora32) V3
    • [X] LilyGo T-Deck
    • [ ] LilyGo T-Beam
    • [X] RAK WisBlock 4631
    • [ ] Seeed Studio T-1000E tracker card
    • [ ] Other (please specify below)

korbinianbauer avatar Nov 06 '25 13:11 korbinianbauer

@compumike Would like to know your thoughts on this also.

GUVWAF avatar Nov 08 '25 11:11 GUVWAF

@GUVWAF I didn't hear / don't see any record of the #contributor-hangout conversation, so I don't have context of what problem this change is trying to solve.

But in general I'm okay with this change in behavior! 👍

Net effects of moving CLIENT_BASE to act in the ROUTER_LATE window:

  • 🔴 higher latency
  • 🔴 higher total airtime / channel congestion (because a message may be rebroadcasted first by a CLIENT and then later by the CLIENT_BASE)
  • 🟢 slightly higher reliability of message deliverability
  • 🟢 better resistance to misconfiguration (unintended favorites)

Which is a real tradeoff, but also is a totally reasonable set of tradeoffs to make!

compumike avatar Nov 15 '25 13:11 compumike

@korbinianbauer are you able to address the comments above?

NomDeTom avatar Dec 02 '25 23:12 NomDeTom

@compumike you're right about the tradeoffs. To add some context on a problem this issue is helping to solve:

In sparse meshes with a "rooftop base + intermediate relay + distant node" topology, the current CLIENT_BASE (acting as ROUTER for favorites) can actually reduce message propagation to favorited nodes:

  1. Indoor node sends message
  2. CLIENT_BASE roof node rebroadcasts early (ROUTER priority)
  3. Critical intermediate CLIENT node (not favorited, ~1km away) hears the CLIENT_BASE rebroadcast
  4. Intermediate node cancels its own rebroadcast (managed flooding: "someone already rebroadcast this")
  5. Message never reaches the wider mesh because the critical hop was suppressed

In my testing, ROUTER_LATE on the roof consistently solved this, while CLIENT_BASE performed worse than having no roof node at all in some cases.

However, ROUTER_LATE on the rooftop is advised against by official docs because it rebroadcasts every packet it hears, significantly increasing airtime usage and potentially degrading the mesh. But since it obviously works well for this specific topology, having it replace ROUTER behavior in CLIENT_BASE (for favorited nodes only) would be ideal - giving us the late rebroadcast timing without the airtime concerns of a full ROUTER_LATE deployment.

geirgp avatar Dec 03 '25 00:12 geirgp

@korbinianbauer are you able to address the comments above?

Full agree on the comments by @compumike from my side.

I also think the trade-off is absolutely worth it. The latency and ChUtil downsides are something that can and should be adressed by A) using an appropriate SF and B) throttling traffic at the origin, not by waiving reliability.

korbinianbauer avatar Dec 03 '25 11:12 korbinianbauer

This PR looks good and I think it can be merged 👍


Reviewing the larger section of ROUTER_LATE behavior, I have a question for @GUVWAF :

Should ROUTER_LATE rebroadcast late always, or only after receiving dupe?

With regards to @geirgp 's "critical intermediate node" example:

My understanding is that ROUTER_LATE (and after this PR, CLIENT_BASE) will initially receive a packet and schedule a rebroadcast during the normal CLIENT window. They'll only move the rebroadcast to the later clampToLateRebroadcastWindow once they hear that packet a second time! (Because clampToLateRebroadcastWindow is only called from perhapsCancelDupe.)

This means there's still a random race between the ROUTER_LATE roof node and the CLIENT intermediate node. ~50% of the time, the roof node will still win, suppressing the intermediate node.

Instead, what if ROUTER_LATE (and CLIENT_BASE, when from/to favorite) always enqueued their rebroadcasts with clampToLateRebroadcastWindow, even when they've only heard a packet once?

Slightly higher latency, but this would make it so that in this example, the intermediate node would never be suppressed by the roof node.

I can imagine:

  • bool RadioInterface::shouldRebroadcastLateLikeRouterLate(meshtastic_MeshPacket *p)
  • modify RadioInterface::getTxDelayMsecWeighted to check it and add delay accordingly

(I don't think this question should delay merging this PR.)

compumike avatar Dec 19 '25 05:12 compumike

@compumike

Slightly higher latency, but this would make it so that in this example, the intermediate node would never be suppressed by the roof node.

Note that the "Late" Window is only guaranteed to be later than the Client window at similar SNR.

Even if Router_Late was to always rebroadcast in the late window, there is a good chance it will still supress a Client.

See Point 4 in #8431

korbinianbauer avatar Dec 19 '25 06:12 korbinianbauer

@compumike

Slightly higher latency, but this would make it so that in this example, the intermediate node would never be suppressed by the roof node.

Note that the "Late" Window is only guaranteed to be later than the Client window at similar SNR.

Even if Router_Late was to always rebroadcast in the late window, there is a good chance it will still supress a Client.

See Point 4 in #8431

Ah, thanks @korbinianbauer -- I had not seen that in #8431!

compumike avatar Dec 19 '25 12:12 compumike