PX4-Autopilot icon indicating copy to clipboard operation
PX4-Autopilot copied to clipboard

[WIP] Events interface

Open bkueng opened this issue 4 years ago • 11 comments

This brings the events interface - I've been sidetracked for quite a while but finally found some time to continue.

For those not familiar: this will replace the STATUSTEXT mavlink messages. Full details: https://docs.google.com/document/d/18qdDgfML97lItom09MJhngYnFzAm1zFdmlCKG7TaBpg/edit

How do you use it? From anywhere in the code, you can do something like:

/* EVENT
 * @description
 * This is the description. It can be as long as you want, contain arguments, URL's or params.
 * - value of a: {1}
 * - value of b: {2}
 * - value of f: {3:.3}
 *
 * Link: <a href="https://docs.px4.io/master/en/config/firmware.html">documentation</a>
 * 
 * <profile name="dev">
 * This is only meant to be shown to developers.
 * It might contain a parameter, like <param>COM_ARM_MAG_STR</param>, to disable a check.
 * </profile>
 * @arg1: a
 * @arg2: b
 * @arg3: f
 */
events::send<uint8_t, uint32_t, float>(events::ID("test"), "Short message describing the event", events::Log::Warning, a, b, f);

From that only an event ID + arguments + log level get sent and logged at runtime. Metadata is extracted during build and gets to the GCS through the COMPONENT_INFORMATION API.

Some PX4 specifics:

  • There's a new submodule: https://github.com/bkueng/libevents (for common events, validation and code generation)
    • the code generation allows to use events from the common namespace (e.g. for calibration protocol)
  • Required memory usage (for event buffers): 1.7KB
  • Event extraction from source takes about 300ms
  • To test I changed the accel calibration to use the events interface, see 34986c1a60c52b23d8a3116673fd27b504a1ecb2. I tested with an artifical event drop rate of 50%, and over telemetry, and it still worked.

Health and Arming checks

Health and arming check results are reported over the events API. There are 2 summary messages that contain overall information about subsystem warnings and errors, and overall arming results per flight mode category + current flight mode. Flight mode categories include Mission, Autonomous, Position, which for example allows a GCS to know whether flying a mission is possible or not, independent from the current mode. Each failure/warning contains the affected subsystem (e.g. IMU) and mode category (allowing a GCS to only show problems for the current mode for example). This and together with continously running the checks and only report on change, requires some changes to the existing code. The structure is similar, so not too big of a change. For an example I added the power + ekf2 checks: ab445394be99ee15efbcf879a6011af07ec395d7. Some requirements:

  • For predictable results there cannot be additonal checks outside of the arming checks (similar for arming-source-dependent checks)
  • We cannot feed mode-dependent results into the checks
  • To cleanly solve cases like arm -> takeoff -> failsafe triggering, bigger changes are required. I could imagine a structure like this (w/o having gone through the details): commander

This PR is on top of https://github.com/PX4/PX4-Autopilot/pull/16039.

Main missing things:

  • COMPONENT_INFORMATION + translation workflow + push metadata to server
  • arming checks and sensor calibration

Open questions:

  • how to transition? What version compatibility do we want?

I'm sure I left out some things, so feel free to ask, provide any suggestions, ...

bkueng avatar Nov 27 '20 15:11 bkueng

Awesome! This is a true game changer how we deal with state and status!

LorenzMeier avatar Nov 27 '20 15:11 LorenzMeier

how to transition? What version compatibility do we want?

@dagar what would be your thoughts on cutting a 1.12 now-ish to capture the multi-EKF as the 'big change', with probably some bugfixes going to that still, and then getting this in to soak a bit for 1.13?

jkflying avatar Nov 30 '20 12:11 jkflying

how to transition? What version compatibility do we want?

@dagar what would be your thoughts on cutting a 1.12 now-ish to capture the multi-EKF as the 'big change', with probably some bugfixes going to that still, and then getting this in to soak a bit for 1.13?

I'd need to settle a few things quickly, but that basically works from my perspective. The more annoying issue is the compatibility with QGC. With the way things are currently done our least bad option might be to carry both STATUSTEXT and EVENT sensor calibration for a transition period (yes this is horrible).

dagar avatar Nov 30 '20 16:11 dagar

Are the events supposed to replace the subsystem_info mechanism or to work together with it and in what capacity?

dusan19 avatar Mar 01 '22 12:03 dusan19

@dusan19 they will replace it, although it could still be published. Is there a particular reason you ask?

bkueng avatar Mar 02 '22 07:03 bkueng

We are using the susbystem info for all health reporting, and it got quite messy with a lot of changes compared to upstream master. Events would be a preferred solution to clean up that mess, and Im happy to hear that the idea is to completely move from subsystem info mechanism.

dusan19 avatar Mar 04 '22 09:03 dusan19

Ok good to know. Do you have any learnings or wishes that you would like to consider while doing the change?

bkueng avatar Mar 04 '22 12:03 bkueng

@bkueng for us (and i think in general as well) its important to have a way of reporting events but also current status. What i mean by that is that we can not rely on the base station to do the bookkeeping of the events. In case the drone connects to the base station after some events are published, or if the drone doesnt have connection while some events happen, it is essential that we still know the current status of the drone especially the health of different sensors or subcomponents upon connecting. For example if the gps fails while we are disconnected, its fine that we miss that event with relevant print statements, but upon connection we still need to know that the gps is in a failure state. Is that handled if subsystem info reports are removed? Sorry if this is a basic question, i didnt go through the code, just briefly read that document.

dusan19 avatar Mar 15 '22 18:03 dusan19

@dusan19 yes that is planned, a GCS gets the health state upon connecting and then on each state change.

bkueng avatar Mar 16 '22 08:03 bkueng

Is this PR still relevant?

junwoo091400 avatar Jul 27 '22 16:07 junwoo091400

I kept it open for reference, as not everything is completed. I'll close it later.

bkueng avatar Jul 28 '22 05:07 bkueng