inav icon indicating copy to clipboard operation
inav copied to clipboard

[Master] FC lock-up after FS RX recovery

Open Jetrell opened this issue 4 months ago • 15 comments

@breadoven I was out today running some tests with a culmination of the later merges. When I also decided to run a FS test. Normally I only do this with the FS mode, so the RF link stays active. But today I thought I'd test shutting off the radio transmitter, and initiating a real FS test.. This is something I hadn't done for over a year from memory. But I had tested it this way many times in years back

The video shows the outcome. Most of the OSD locked up at the moment the RF signal from the radio TX returned. I cross referenced this with the CRSF telemetry. And the link to the RX did return at that moment. And it was also at that moment the FC log recording was also terminated. It showed nothing unexpected before that point. However the interesting part is the link quality element on the OSD is still actively blinking! Almost seems like the program was stuck in a loop that locked out anything not related directly to FS.

This test was done with nav_rth_trackback_mode = FS and safehome_usage_mode = RTH... I only enabled Trackback recently to test it again. Due to not having tested it since it was given its own .c directory a year back. If it helps to know. The safehome was about 95m farther on. At the end of the main runway. And nav_rth_trackback_distance = 450 was with-in its range from where FS was activated.

https://github.com/user-attachments/assets/c0753bfa-1554-42a2-9d69-f3754275f9cd

If you get some time. Could you run the setting and FS trigger for this model with the SITL ?

Setting Diff.txt

  • INAV version string: 9.0 master (21-8-2025)
  • MATEK F765SE

Jetrell avatar Aug 25 '25 10:08 Jetrell

Hmmm that's unfortunate. Should be able to test it with HITL to see if it's repeatable in some way. Will let you know.

breadoven avatar Aug 25 '25 22:08 breadoven

Tested using HITL and it isn't reproduced. This is with FS set to RTH, Trackback set to FS and Safehome set to RTH. It would appear it was still back tracking when the Rx signal came back causing the lock up so probably not related to Safehomes.

Hard to say what went wrong other than the FC locking up or failing in some other way.

breadoven avatar Aug 26 '25 20:08 breadoven

Thanks for giving it a test. It was an unusual set of conditions, based on specific distances. It might not have been a good idea in hindsight. I'll lay it out for you.

I'd just launched in auto launch, and manually disengaged it at 35m altitude. I then let the plane continue to climb until 80m altitude, then switch off the TX to start the FS RTH Trackback. This was 242m from the home point. So it started to trackback to the home point, as it should... With nav_rth_trackback_distance = 450. Comparing the OSD and RC logs. The link was reestablish between 57m down to 5m from the home location. So it was effectively right above the arming location.

It would appear it was still back tracking when the Rx signal came back causing the lock up so probably not related to Safehomes.

I'd agree.. It would seem that it didn't like leaving Trackback when with-in nav_rth_trackback_distance, after it had reached the home location. Because it essentially had nowhere to Track too, after only leaving there at launch/arming 242m earlier.

Could this be a case in the logic that hasn't been accounted for ?
As rare as it would be, it could still happen if the RX plug got disconnected under the force of launch.

I thought of another case that the logic may not have accounted for. Being that Trackback will not follow altitude changes recorded below the altitude it is enabled... Which is how it worked on the return trip. This may also have lead to open-end logic when it got back to the location the first track marker was recorded. (practically at the launch altitude. Which was the beginning of the flight.. A condition I assume you originally wouldn't have taken into account.

If I had not reestablished the RX link. I logically assume it should have entered RTH and flew back to the safehome ?

Jetrell avatar Aug 26 '25 22:08 Jetrell

It's possible there's some bug related to trackback being cancelled during FS but it seems unlikely. Would need a trawl through the code to see if it's possible. I assume you hadn't cancelled FS by moving the sticks when the FC locked up ?

breadoven avatar Aug 27 '25 11:08 breadoven

I assume you hadn't cancelled FS by moving the sticks when the FC locked up ?

I always have failsafe_stick_threshold = 0 as seen in my settings, which I assume you loaded in your sim test.. But once the plane started heading towards the ground, the sticks where certainly deflected by me. So at the time the link was restored -

  • FS was not cancelled.. Yet RTH did not continue to fly the plane to the Safehome 95m away.
  • RC control was not returned.
  • The motor shutdown.
  • The OSD locked up (Except for the link quality indicator, which was still flashing)
  • RX telemetry from the FC was lost

This image from the CRSF radio log shows the loss of signal and telemetry when the TX was turned off. Then it recovered, as can be seen by both Uplink and especially Downlink. But no RX telemetry was received from the FC at the time of link recovery.

Image

If I had to sum up what I see from this. Especially the fact the OSD link quality indicator is the only element that keeps flashing. And the FC didn't reboot. It would seem the code execution hit an impasse in its handling of the RX, that it couldn't return from. Yet it can still access the timer force_sw_blink.

Some other things that stood out are proximity to the launch/arming location, the FS was terminated.
And concerning the OSD. I have osd_failsafe_switch_layout = ON for this plane.. I noticed that the OSD did not switch back from the default OSD page to Alt 1 page, when FS recovered.

This is an awkward one. Because it can't be tested in flight without possibly the same consequences... Most conditions can be tested with some risk and I can always work around it... However if the FC locks up or gets stuck and takes away RC control, the model is toast. I've had many crashes in testing before. But this one is far more of a concern.

Jetrell avatar Aug 28 '25 03:08 Jetrell

I didn't use your settings other than the ones related to RTH and basic FS settings. I can try again with more settings that are relevant or in fact all those that aren't incompatible with the HITL test plane. I also realised I didn't have Geo Fencing compiled in the firmware so that definitely needs including for a retest.

As I understand it so long as the OSD chip is powered it will just keep overlaying the last held state, including flashing characters etc, until it's updated by the firmware so the fact the flashing continued doesn't mean much since it's done by the OSD chip (display_force_sw_blink was off in this case). I guess it does show the FC remained powered though.

Does seem like the FC simply locked up, probably stuck in a doom loop, or an overflow condition although that usually doesn't kill things dead just causes odd behaviour. String handling for the OSD has often caused lockups when testing before so maybe related to that, changing messages when the FS was cancelled ?. Or it could just be something related to handling the Rx signal I guess.

osd_failsafe_switch_layout = ON didn't appear to do anything, the OSD screen remained the same before and during the Failsafe. Should it have changed ?

Obviously needs resolving if possible, lockups are a complete no no.

breadoven avatar Aug 28 '25 08:08 breadoven

@Jetrell Tested with Geozones and your settings (where possible) and no lockups. Not going to be easy to work out what happened unless it's possible to find something in the code that might explain it.

breadoven avatar Aug 28 '25 14:08 breadoven

Tested with Geozones and your settings (where possible) and no lockups. Not going to be easy to work out what happened unless it's possible to find something in the code that might explain it.

Thanks for trying. If it is RX code related. Where you able to HITL test it with CRSF. Either TBS or ELRS ?

When you where sim testing under the conditions my issue occurred.. I'd be still interested to know how the code should react for piece of mind. Because its wasn't clear to me. i.e. Tracking back to the home location straight after launch (obviously auto launch finished). Either Trackback FS or RTH. It will trackback return to the home location as it did in my case, and runs out of track markers before nav_rth_trackback_distance is exceeded.

  • Does trackback continue to a safehome if available. Adjusting altitude according to nav_rth_home_altitude ?
  • Or without a safehome. How does RTH react in this case, when the home location has already been made ? Being that it wasn't designed to work this way.

It was something I or Marc never thought to test with the changes you made in 8.0. And I'm sure I never tested this originally back in INAV 5 when you implemented it.. I remember tracking back past the home location in some of my tests. But all those conditions occurred when nav_rth_trackback_distance fell well within a considerable traveled distance after takeoff.

Jetrell avatar Aug 28 '25 22:08 Jetrell

It was using ELRS for the test.

The first trackback point is recorded 50m from home so that's where trackback will end if you don't exceed the nav_rth_trackback_distance first. At that point it simply switches back to normal RTH starting from the beginning. In your case it should have climbed/headed home given you have nav_min_rth_distance set to 0.

Surely the log will tell you what was happening at the point the log stopped, whether it was still back tracking or doing the normal RTH ?

breadoven avatar Aug 29 '25 08:08 breadoven

Surely the log will tell you what was happening at the point the log stopped, whether it was still back tracking or doing the normal RTH ?

Interesting the log stopped recording exactly 50m from the arming location. The safehome is 95m from that point NNE.

Image

It was in FS according to the log. MWP shows RTH... With the log showing the signal never returned. Which wasn't the case according to the radio logs. If the FC had seen the signal return from a FS. Manual control would have been given back to me. p.s. I can't sync both logs due to the radio's RTC not being the same as the FC's GNSS time.

Image

Jetrell avatar Aug 29 '25 08:08 Jetrell

Would you be more inclined to post logs @Jetrell if your location was hidden by some means such as the PR submitted before that offset the Lat, Lon values ? Just asking because it's much easier to check things looking at actual logs although it's easy to understand why people want to maintain privacy given the attitude of the authorities these days.

navState 38 is Trackback so it was still back tracking. There is home distance at the end of the logs and you can also check how far actual x/y position was from the target x/y position to understand if it had arrived at the final trackpoint when the log stopped. If it had then it does suggest there's something odd happening at that point, possibly only if FS RTH gets cancelled at that exact time.

breadoven avatar Aug 29 '25 15:08 breadoven

Looking at the log @Jetrell it appears the FC locked up at the point it reached the last trackback waypoint. Guess it's just a case of trying to work out why that caused a problem which might be difficult given it doesn't happen HITL testing.

breadoven avatar Aug 30 '25 10:08 breadoven

Looking at the log, it appears the FC locked up at the point it reached the last trackback waypoint

It certainly looks that way. Maybe nothing to do with FS recovery at all ?.. I'm trying to remember back to the moment I switched the transmitter back on. After watching the DVR again and thinking back hard to the moment. I feel a sense that I left it until the plane started to turn. Because I actually thought it was going into an RTH loiter.. It wasn't until it started to dive, that I knew something was wrong, then I gave stick input and pulled the throttle back and disarmed.. Even though I could hear the motor had shut down and was only freewheeling. I didn't a pickup the OSD had locked up until I later looked at the DVR playback.

Guess it's just a case of trying to work out why that caused a problem which might be difficult given it doesn't happen HITL testing.

Yes. I guess its a bit of needle in a hay stack type thing.

I have a request.. I remember you saying you compiled F411 builds for one of your models ? The reason I ask. Is because I have an old Dart 250G I used to use for more critical testing. But I'd never got around to change out the MATEKF411_SFTSRL2 for an F405 of sorts.. This plane, being so lite, could handle a crash will little to no damage. If I could get a 9.0 build of the aforementioned target. I might be able to run some less risky flight tests again. I'd even thought of adding a second parallel RX without model match. That I could use an unused channel to reboot the FC's supply via a MOSFET switch. And give myself a few hundred meters altitude when returning with trackback, leaving enough reboot time if another lockup did occur.

Jetrell avatar Aug 30 '25 11:08 Jetrell

The HITL test plane is using an F411 board so no problem compiling for an F411 build.

Would rebooting the board in flight work ? I'd have thought you would have problems with the calibration checks if it's moving around although you can set init_gyro_cal = OFF to avoid this I guess.

breadoven avatar Aug 31 '25 11:08 breadoven

Thought I posted this before but perhaps I forgot to hit comment. MATEKF411_SFTSRL2 hex @Jetrell based on current master if you want to test again.

inav_9.0.0_MATEKF411_SFTSRL2.hex.zip

breadoven avatar Sep 02 '25 17:09 breadoven