PX4-Autopilot icon indicating copy to clipboard operation
PX4-Autopilot copied to clipboard

Backports to release/1.15

Open davids5 opened this issue 1 year ago • 20 comments

Hi @davids5, @PetervdPerk-NXP

i'm getting hardfaults on a new "Generic Octo-X" airframe test setup: fault_1970_01_01_00_17_00.log fault_2024_05_13_13_46_05.log px4_fmu-v6xrt_default_elf.zip

i'm in the configuration process. Hardfault occured after Actuator setup. Currrent state of parameter settings (downloaded after crashdump file removed): ID20_Octo-X_20240513.zip Radio and Flightmodes are not yet setup.

nsh: sysinit: fopen failed: No such file or directory

NuttShell (NSH) NuttX-11.0.0 nsh> ver all HW arch: PX4_FMU_V6XRT HW type: V6XRT HW FMUM ID: 0x000 HW BASE ID: 0x003 PX4 git-hash: 0928731839608768fcfa1c5e967da7c0feef13e2 PX4 version: 1.15.0 80 (17760384) PX4 git-branch: pr-bp-release/1.15 OS: NuttX OS version: Release 11.0.0 (184549631) OS git-hash: 4be592dd2114d2c07505b143493a3cfa6dc9c239 Build datetime: May 10 2024 11:34:47 Build uri: localhost Build variant: default Toolchain: GNU GCC, 9.3.1 20200408 (release) PX4GUID: 000900000000000000008292807c2c2a800e MCU: i.MX RT1170 rB0, rev. 0 nsh>

FMUM-RT7 on new NXP baseboard without IO prozessor.

dk7xe avatar May 13 '24 14:05 dk7xe

peter@NXL04520:~/src/Firmware$ arm-none-eabi-addr2line -e build/px4_fmu-v6xrt_default/px4_fmu-v6xrt_default.elf 0x00002411
/home/peter/src/Firmware/src/lib/matrix/matrix/Matrix.hpp:110
peter@NXL04520:~/src/Firmware$ arm-none-eabi-addr2line -e build/px4_fmu-v6xrt_default/px4_fmu-v6xrt_default.elf 0x0000244d
/home/peter/src/Firmware/src/modules/control_allocator/ControlAllocator.cpp:439
peter@NXL04520:~/src/Firmware$ # 1970 crash
peter@NXL04520:~/src/Firmware$ arm-none-eabi-addr2line -e build/px4_fmu-v6xrt_default/px4_fmu-v6xrt_default.elf 0x00002421
/home/peter/src/Firmware/src/modules/control_allocator/ControlAllocator.cpp:438
peter@NXL04520:~/src/Firmware$ arm-none-eabi-addr2line -e build/px4_fmu-v6xrt_default/px4_fmu-v6xrt_default.elf 0x0000245d
/home/peter/src/Firmware/src/modules/control_allocator/ControlAllocator.cpp:442
peter@NXL04520:~/src/Firmware$ # 2024 crash

Looks to be ControlAllocator writing into wrong memory. @dk7xe you sure you've got a clean parameter config?

PetervdPerk-NXP avatar May 13 '24 14:05 PetervdPerk-NXP

Did a reset to FW default

dk7xe avatar May 13 '24 14:05 dk7xe

Flighttesting today GPD10 testdrone Holybro S500 frame with Pixhawk6X-RT (NXP baseboard eval batch) and PX4 build based on 0928731 From https://github.com/PX4/PX4-Autopilot/pull/23110 . Pixhawk4 GPS on GPS1, DroneCAN M8N GPS and BMS772 connected to CAN2, ESC60Apro connected to CAN1 and PWM1-4 (Dshot600), uAvionics ADSB RX connected to Telemetry2, UWB board connected to GPS2, Tflite connected to Telemetry3 but not enabled in SW.

Test1 - first takeoff manual mode with CAN and PWM connected, BMS not connected. Mission until low battery https://logs.px4.io/plot_app?log=936a2894-882a-4cb8-92ee-aafb75dfe1ea Navigator task low on stack (180 bytes left)

Test2 - mission mode https://logs.px4.io/plot_app?log=5cac4d62-c1ce-482f-be3d-edbb5de36e6b

Test3 - mission mode https://logs.px4.io/plot_app?log=a487acda-35f2-4f8e-9226-646355876038

Test4 - mission mode https://logs.px4.io/plot_app?log=e581b04e-8a91-4eca-a121-2a7bfc512d3c mission termination due to helicopter approaching

Test 5 - mission mode (continuation of Test4) https://logs.px4.io/plot_app?log=7aebcecc-2d5b-4b9d-a0b9-52143fa025fd Successful test of RC loss failsave.

Test6 - BMS and GPS connected to CAN2 https://logs.px4.io/plot_app?log=cae4442c-79ba-44a6-85ef-ead59ae9b3ab Very wobbly. BMS disconnected afterwards

Test7 - mission mode, BMS disconnected again https://logs.px4.io/plot_app?log=15540362-a5f2-4b8d-aaf5-67f08c655f25

NXD20 testdrone NXD20 testdrone Holybro S500 frame in Octo-X configuration with Pixhawk6X-RT (NXP baseboard release batch) and PX4 build based on 0928731 From https://github.com/PX4/PX4-Autopilot/pull/23110 . Pixhawk4 GPS on GPS1, ESC60Apro connected to CAN1, CAN2 and PWM1-8 (Dshot600), Holybro 433MHz telemetry on Telem1

Test1 - first takeoff manual mode with PWM ESC control active only. Position hold. https://logs.px4.io/plot_app?log=9cd3c383-b87f-4a2b-bd1d-dd14ed69d812 While being in position hold for ~1min hardfault midair.

fault_2024_05_14_09_07_45.log

dk7xe avatar May 14 '24 09:05 dk7xe

With GPD10 i had at Test8 at 1:06 a "Failsave:" ??? that resulted in crash. No further tests possible https://logs.px4.io/plot_app?log=eda13eee-29b0-49dc-ac9d-b41502eb8323 @PetervdPerk-NXP @davids5

dk7xe avatar May 14 '24 13:05 dk7xe

Further thests with NXD20 testdrone Holybro S500 frame in Octo-X configuration with Pixhawk6X-RT (NXP baseboard release batch) and PX4 build based on https://github.com/PX4/PX4-Autopilot/commit/0928731839608768fcfa1c5e967da7c0feef13e2 From https://github.com/PX4/PX4-Autopilot/pull/23110 . Pixhawk4 GPS on GPS1, ESC60Apro connected to CAN1, CAN2 and PWM1-8 (Dshot600), Holybro 433MHz telemetry on Telem1

Test2 - takeoff manual mode, position hold, mission https://logs.px4.io/plot_app?log=5876f700-1994-4dae-ad64-11e0568aee04 Crash while in mission mode. Battery fell off. No hardfault logging.

Test3 - flying around in poshold https://logs.px4.io/plot_app?log=78cc06d8-07ea-43b6-9cf7-b0748bb8efa4 No issues

Test4 - flying around in poshold https://logs.px4.io/plot_app?log=d236f150-2ad4-4a62-bb46-a8f795fa096c No issues

Afterwards hardfault occurred see fault_2024_05_14_14_52_45.log @davids5 @PetervdPerk-NXP

dk7xe avatar May 14 '24 15:05 dk7xe

Further thest with NXD20 testdrone Holybro S500 frame in Octo-X configuration with Pixhawk6X-RT (NXP baseboard release batch) and PX4 build based on https://github.com/PX4/PX4-Autopilot/commit/0928731839608768fcfa1c5e967da7c0feef13e2 From https://github.com/PX4/PX4-Autopilot/pull/23110 . Pixhawk4 GPS on GPS1, ESC60Apro connected to CAN1, CAN2 and PWM1-8 (Dshot600), Holybro 433MHz telemetry on Telem1

Test5 - mission mode, dronecan enabled https://logs.px4.io/plot_app?log=0c94f603-e5b0-49c1-96cc-8138e100dcff Hardfault occurred.

See fault_2024_05_14_17_24_14.log fault_2024_05_14_17_24_21.log @PetervdPerk-NXP @davids5

dk7xe avatar May 14 '24 17:05 dk7xe

@davids5 @PetervdPerk-NXP another finding is that several attempts are needed to get actuator signal changed to Dshot600.

dk7xe avatar May 17 '24 05:05 dk7xe

Looking at faults:

Further thest with NXD20 testdrone Holybro S500 frame in Octo-X configuration with Pixhawk6X-RT (NXP baseboard release batch) and PX4 build based on 0928731 From #23110 . Pixhawk4 GPS on GPS1, ESC60Apro connected to CAN1, CAN2 and PWM1-8 (Dshot600), Holybro 433MHz telemetry on Telem1

Test5 - mission mode, dronecan enabled https://logs.px4.io/plot_app?log=0c94f603-e5b0-49c1-96cc-8138e100dcff Hardfault occurred.

fault_2024_05_14_17_24_14.log in /imxrt_flexcan.c:539 R0 is invalid

 538                        mb = flexcan_get_mb(priv, mbi);
3019e4ae: 21 46         mov     r1, r4
3019e4b0: 28 46         mov     r0, r5
3019e4b2: ff f7 ef ff   bl      0x3019e494 <flexcan_get_mb>
 539                        if (mb->cs.code != CAN_TXMB_DATAORREMOTE)
PC - 3019e4b6: 03 68         ldr     r3, [r0, #0]
3019e4b8: c3 f3 03 63   ubfx    r3, r3, #24, #4
3019e4bc: 0c 2b         cmp     r3, #12

fault_2024_05_14_17_24_21.log

Call from (LR) __ZN4uORB12Subscription4copyEPv_veneer> in SubscriptionInterval.hpp to garbage (PC)

fault_2024_05_14_14_52_45.log Called from

MavlinkStreamAttitudeQuaternion::send() at ATTITUDE_QUATERNION.hpp:77 0x30fc0	

bool send() override
	{
		vehicle_attitude_s att;

		if (_att_sub.update(&att)) {
			vehicle_angular_velocity_s angular_velocity{};
			_angular_velocity_sub.copy(&angular_velocity);

			vehicle_status_s status{};
			_status_sub.copy(&status);

			mavlink_attitude_quaternion_t msg{};

LR:			msg.time_boot_ms = att.timestamp / 1000;
			msg.q1 = att.q[0];
			msg.q2 = att.q[1];
			msg.q3 = att.q[2];
			msg.q4 = att.q[3];
			msg.rollspeed = angular_velocity.xyz[0];
			msg.pitchspeed = angular_velocity.xyz[1];
			msg.yawspeed = angular_velocity.xyz[2];

PC 0x30022512 is constant manipulation

image

but looing back in memory the dissassembly looks odd: Note the odd addresses

image

fault_2024_05_14_09_07_45.log LR is 0x00001e98 in orb and Odd PC is 0x300d6bbe which is loading constant.

davids5 avatar May 17 '24 07:05 davids5

I agreed with @PetervdPerk-NXP on the following steps on the NXD20 testdrone Holybro S500 frame in Octo-X configuration:

  • do an mtd erase
  • reboot
  • do param reset_all
  • start configuration from scratch I hope i can manage testing today still

dk7xe avatar May 17 '24 10:05 dk7xe

nsh: sysinit: fopen failed: No such file or directory

NuttShell (NSH) NuttX-11.0.0 nsh> mtd erase Erasing /fs/mtd_params Erased 4096 bytes Erasing /fs/mtd_waypoints Erased 4096 bytes nsh> reboot

All settings were gone after the reboot. connect again

nsh: sysinit: fopen failed: No such file or directory

NuttShell (NSH) NuttX-11.0.0 nsh> ver all HW arch: PX4_FMU_V6XRT HW type: V6XRT HW FMUM ID: 0x000 HW BASE ID: 0x003 PX4 git-hash: 0928731839608768fcfa1c5e967da7c0feef13e2 PX4 version: 1.15.0 80 (17760384) PX4 git-branch: pr-bp-release/1.15 OS: NuttX OS version: Release 11.0.0 (184549631) OS git-hash: 4be592dd2114d2c07505b143493a3cfa6dc9c239 Build datetime: May 10 2024 11:34:47 Build uri: localhost Build variant: default Toolchain: GNU GCC, 9.3.1 20200408 (release) PX4GUID: 000900000000000000008292807c2c2a800e MCU: i.MX RT1170 rB0, rev. 0 nsh> param reset_all nsh>

start configuring drone from scratch Airframe set to Generec Octo X grafik Compass calibrated grafik reboot Gyro calibrated grafik Accelerometer calibrated grafik Level Horizon calibrated grafik Radio calibrated grafik FlightModes set grafik Power - Number of cells set to 6 (no further changes) grafik reboot and reconnect again

dk7xe avatar May 17 '24 15:05 dk7xe

Actuators - AUX1 set to DShot600 grafik reboot

continued with the same process for AUX2 - AUX8 after last reboot.. grafik setting AUX1 - AUX8 Function to Motor 1 - Motor 8 grafik reboot and reconnect setting DSHOT_MIN to 0.10% since FETtec ESC need a Min Command value of ~1070 grafik connecting 5S battery (who will find the error made before?) Actuator Testing grafik

RESULT: MOTORS do spin!

@PetervdPerk-NXP @davids5 i don't know if it is the mtd erase or the sequence i had set the Actuator Outputs. But at least DShot is working now on 8 Motors.

dk7xe avatar May 17 '24 15:05 dk7xe

20240517_175721.jpg

Will give it a try with DShot only first and afterwards enable CAN bus.

dk7xe avatar May 17 '24 15:05 dk7xe

Result of first indoor test with DShot600 only Motors did not spin when arming - https://logs.px4.io/plot_app?log=427cd051-cc32-45df-a528-dc0f8cf572a5 Motors started - https://logs.px4.io/plot_app?log=3e145339-9409-4789-ab45-4e3c4377857a 6min hovering - https://logs.px4.io/plot_app?log=7c418c77-2b4f-49f3-a0d6-f7ccadb74dd1 Short hovering - https://logs.px4.io/plot_app?log=810e51a2-4be0-4a63-9e74-5098b69b29b3

Again i had it at least once at the beginning that there was no output signal and motors did not arm. After reconnecting battery everything was ok.

dk7xe avatar May 17 '24 16:05 dk7xe

Enabling CANbus, besides DShot600 (ESC's do fasilover if CAN signal is lost) grafik reboot connecting battery, connecting via USB setting Actuators: hardfault occured at that stage after assigning Motor 3 grafik

fault_2024_05_17_16_37_39.log

@davids5 @PetervdPerk-NXP

dk7xe avatar May 17 '24 16:05 dk7xe

deleted fault log, reconnected via USB, not Battery connected, continuing with CAN motor assignment grafik reboot connecting via USB, afterwards connecting battery motortest successful grafik disconnecting battery, disconnecting USB

dk7xe avatar May 17 '24 16:05 dk7xe

Successful testfligt until battery empty indoors. ESC's controlled via droneCAN with DShot failover enabled https://logs.px4.io/plot_app?log=373c4ae0-4a46-48e8-a649-d32d77c46cdb

dk7xe avatar May 17 '24 17:05 dk7xe

Hardfault occured 10sec after connecting battery for 2nd testflight ESC's controlled via droneCAN with DShot failover enabled fault_2024_05_17_17_05_51.log It's again LPWORK. Thats what i have seen already previosly that LPWORK stack needs to be increased for preventig from hardfault when droneCAN is enabled.

After erasing the fault log file from SD i did the 2nd successful testflight after DShot signal was not issued on all motors at first arming attempt emptied the 2nd battery without issues. Unfortunately stupid me has forgotten to insert SDcard before flight ;(

For reference my parameter settings of NXD20 testdrone Holybro S500 frame in Octo-X configuration with Pixhawk6X-RT (NXP baseboard release batch) and PX4 build based on 0928731 From https://github.com/PX4/PX4-Autopilot/pull/23110 . Pixhawk4 GPS on GPS1, ESC60Apro connected to CAN1, CAN2 and PWM1-8 (Dshot600), Holybro 433MHz telemetry on Telem1 ID20_Octo-X_20240517.zip

dk7xe avatar May 17 '24 17:05 dk7xe

Enabling CANbus, besides DShot600 (ESC's do fasilover if CAN signal is lost) grafik reboot connecting battery, connecting via USB setting Actuators: hardfault occured at that stage after assigning Motor 3 grafik

fault_2024_05_17_16_37_39.log

@davids5 @PetervdPerk-NXP

This is faulting in Ekf::fuseDrag

Can you swap the FMUM and see if the issue persist?

davids5 avatar May 21 '24 15:05 davids5

Is this ready to come in @davids5 ? otherwise let's mark it as a DRAFT please

mrpollo avatar May 22 '24 15:05 mrpollo

~~Waiting on https://github.com/PX4/PX4-Autopilot/tree/pr-fix-px4_fmu-v6xrt-bootloader~~

davids5 avatar May 28 '24 12:05 davids5

All the commits needed from https://github.com/PX4/PX4-Autopilot/tree/pr-fix-px4_fmu-v6xrt-bootloader are in.

davids5 avatar May 29 '24 15:05 davids5

Did update my GPD10 testdrone (Tarot720 quad) to 482d590 Did NOT run through 'mtd erase' and 'parameter reset' this time.

nsh: sysinit: fopen failed: No such file or directory

NuttShell (NSH) NuttX-11.0.0 nsh> ver all HW arch: PX4_FMU_V6XRT HW type: V6XRT HW FMUM ID: 0x000 HW BASE ID: 0x003 PX4 git-hash: 482d590dd9cf2df48ad8f29418a3d10b1c7b0f40 PX4 version: 1.15.0 80 (17760384) PX4 git-branch: pr-bp-release/1.15 OS: NuttX OS version: Release 11.0.0 (184549631) OS git-hash: 6fbb26eb521999844f099ac93974fdc7ccca6016 Build datetime: May 31 2024 16:26:51 Build uri: localhost Build variant: default Toolchain: GNU GCC, 9.3.1 20200408 (release) PX4GUID: 000900000000000000008292807c2929800e MCU: i.MX RT1170 rB0, rev. 0 nsh>

Observation: Voltage sensor reading stops when armed and regains when being disarmed. Log file of hovering in the garden 1m above ground - https://logs.px4.io/plot_app?log=0d4d6820-9907-4bdc-8bdd-6d6feaabf5fc

dk7xe avatar Jun 01 '24 15:06 dk7xe

With having the ina226 power module on Power2 no issue - https://logs.px4.io/plot_app?log=435834b4-8d6a-43a9-85cc-c1c2d8cd99c1

dk7xe avatar Jun 01 '24 16:06 dk7xe

Hovering around in the garden without issues (ina226 on Power2 still), FETtec ESC60Apro controlled from droneCAN - https://logs.px4.io/plot_app?log=771291aa-d689-456a-8461-b38d405b0a68 Vibrations are higher than before due to an issue with one propeller.

dk7xe avatar Jun 01 '24 17:06 dk7xe

@davids5 should we wait with testing until https://github.com/PX4/PX4-Autopilot/pull/23210 is in?

dk7xe avatar Jun 03 '24 21:06 dk7xe

@davids5 should we wait with testing until #23210 is in?

No there is a critical fix in this PR that needs to be tested.

davids5 avatar Jun 04 '24 08:06 davids5

Tested on my Octo-X setup on the bench (https://github.com/PX4/PX4-Autopilot/pull/23110#issuecomment-2117909179). After ~1.5hr a hardfault occured. fault_1970_01_01_01_50_32.log ID20_minicom.txt ID20_Octo-X_20240604_param.zip DroneCAN ESC control enabled besides Dshot600.

dk7xe avatar Jun 04 '24 11:06 dk7xe

Tested on my Octo-X setup on the bench (#23110 (comment)). After ~1.5hr a hardfault occured. fault_1970_01_01_01_50_32.log ID20_minicom.txt ID20_Octo-X_20240604_param.zip DroneCAN ESC control enabled besides Dshot600.

The for the fault , looks odd

image

LR suggest this came from up_dshot_trigger image

@PetervdPerk Can you see any memory overrun issues in that driver?

davids5 avatar Jun 04 '24 13:06 davids5

Thanx Daniel! .. btw we got a bit further with testing on the OctoX. The FMU is running since 7hr now. DroneCAN has been disabled at the moment. When 8hrs are reached DroneCAN will be enabled again (requires reboot).

FMURT7 with NXP baseboard (latest version), Pixhawk4 GPS connected to GPS1. 8x FETtec ESC PWM control port connected to PWM OUT 1-8 (set to Dshot600 in SW). FMU CAN1 daisy chained to FETtec ESC 1-8 CAN1 and terminated by CANterm. FMU CAN2 daisy chained to FETtec ESC 8-1 CAN2 and accidently not terminated (just now connected to FMU CAN3 to have termination). CAN bus connection is just CAN H, CAN L and GND on all connections. FrSKY R-XSR SBUS RC Rx connected to RC IN. No other components. FMU has been powered from Dell Laptop yesterday. Today it's powered from USB powerbank.

dk7xe avatar Jun 05 '24 19:06 dk7xe