firmware icon indicating copy to clipboard operation
firmware copied to clipboard

[Bug]: Traceroutes via ROUTER_LATE node dont end up in the TX Queue after modification (to add self hop in rebroadcast) on return path!

Open Talie5in opened this issue 11 months ago • 8 comments

Category

Other

Hardware

Linux Native

Firmware Version

2.5.20

Description

When doing a traceroute via a ROUTER_LATE node, traceroutes are seen leaving, and coming back, but not modifying and putting itself into the TX queue to rebroadcast, therefor traceroute never gets returned to source node - so a return path only issue.

Captured in DEBUG Log on node setup in ROUTER_LATE

TR_Repose_NoRequeue.txt

Relative Isolated Test Environment, If i take source node for a walk to where it does in fact get direct sight to another CLIENT node or the destination node directly, traceroute works from a RAK4631 in CLIENT_MUTE and receives the response.

Relevant log output


Talie5in avatar Jan 28 '25 12:01 Talie5in

Thanks @Talie5in - I will try to tackle this tomorrow or Thursday. I need to dig into the traceroute code anyway for #5534, so this is good additional motivation for me to do so!

erayd avatar Jan 28 '25 12:01 erayd

Have managed to replicate the lack of traceroute response via ROUTER_LATE. Now I just need to figure out why it's happening...

erayd avatar Jan 28 '25 15:01 erayd

@Talie5in Which exact commit were you using when testing this? I can't find the string "Incoming msg will be filtered, from" from your log in the source code. So I'm not sure where that is coming from, but a bit later it mentions cancelSending id=0x827fe923, removed=1 meaning it removed it from the Tx queue.

GUVWAF avatar Jan 30 '25 16:01 GUVWAF

It seems to be coming from your modified firmware: https://github.com/Talie5in/mt-device-firmware/blob/02b2ee8883663618a0c6319fc37fe137a6d2ac25/src/mesh/Router.cpp#L574

I believe this is your issue. You're canceling a packet in the Tx queue when another arrives. For ROUTER_LATE this is more likely to happen as it delays the rebroadcast.

GUVWAF avatar Jan 30 '25 16:01 GUVWAF

I wonder what it was that I was reproducing then? Because I can get the behaviour to recur here.

erayd avatar Jan 30 '25 19:01 erayd

@GUVWAF Yup, appears that is the culprit in those logs - just got around to retesting this (back to 2.5.20.4c97351) and i do eventually get the TR back (which is valid and inline with ROUTER_LATE) - i was switching between that build and the original meshtastic release while testing things - didnt realise I didnt do it on the right build at the time.

Apologizes for delayed response.

However I am still getting some that just never make it back (but do see them hit the device in the debug logs, just never make it to the source device), but curious if that's just hitting some kind of "took to long for a result so I stopped tracking the traceroute".

@erayd Not sure if you've come across anything further?

If I can find more time in the coming week i'll trail logs between router_late on the roof and the node on my desk and see if I can line them up for a submission.

Talie5in avatar Feb 03 '25 12:02 Talie5in

Not sure if you've come across anything further?

Not yet, but I haven't yet had the opportunity to watch the logs of a ROUTER_LATE in a location where there's no other path back to my test node. Downside of having a mesh with quite good coverage.

It's easy enough to engineer the no-response thing by just going to one of the infill areas. But I can't watch the logs at the same time. Need to find a time to enlist help I think. Get someone else to run the traces while I sit up at the RL site and watch the logs.

erayd avatar Feb 03 '25 12:02 erayd

Just thinking out loud: I wonder if you could reproduce it at home with a three node test setup. The three nodes on their own frequency slot, tx power turned down, with nodes A and C placed far enough apart to ensure that they hop through B.

todd-herbert avatar Feb 06 '25 06:02 todd-herbert

@Talie5in have you been able to reproduce this with stock firmware? I'm seeing some issues on 2.5.20 with a client sending a traceroute through a ROUTER_LATE where I don't get the traceroute responses on my client, but the router node updates its nodedb with the traceroute target almost immediately - so I assume it is seeing the response, just not rebroadcasting it to my client.

Unfortunately my router node is in a place that I can't tail logs from, so its difficult to diagnose this further.

noahhaon avatar Mar 25 '25 09:03 noahhaon