ardupilot
ardupilot copied to clipboard
AP_ESC_Telem: support Extended DShot Telemetry v2
Description [Edited 2024-05-04 to reflect the new changes]
This PR adds support for Extended DShot Telemetry v2, the specification for which is available here. Currently, recent versions of Bluejay support this specification, AM32 may support it soon, and BlHeli32 possibly has some limited support.
Notably, EDTv2 introduces two new types of messages, which are useful to investigate the performance of ESCs:
- the "stress level" message that contains a number in the range [0..255], which is the greater the more effort an ESC had to make during commutation;
- the "status" message contains three bits for different warning levels:
- "alert": the lowest level, Bluejay relates it to demag events,
- "warning": the middle level, Bluejay relates it to desync events,
- "error": the highest level, Bluejay relates it to stall events, as well as the maximum stress level encountered, which is in the range [0..15].
The existing EDT parser handles these messages, but ignores them. This PR propagates them all the way through the AP_ESC_Telem machinery and logs the obtained information using a new log message, EDT2, which contains the following fields:
Instance: the ESC instance identifier;Stress: the latest stress measurement, in the range [0..255];MaxStress: the maximum stress level since latest arming, in the range [0..15];Status: a combination of the status bits from above and of the info on what has been received:- bit 0: true if the message contains up-to-date stress data;
- bit 1: true if the message contains up-to-date status data;
- bit 2: true if the last status had the "alert" bit (e.g. demag);
- bit 3: true if the last status had the "warning" bit (e.g. desync);
- bit 4: true if the last status had the "error" bit (e.g. stall).
Design decisions
- As the log message comes from two different messages, which come at different rates, the "up-to-date" bits were introduced to see how recently a certain value has been updated. An alternative would be some kind of stall data logic, but EDTv2 messages come really irregularly anyway, and there is no NaN marker for integral values.
- Two different stress components had to be introduced, because they are related in a vendor-dependent non-trivial way (it is not linear scaling), and certain events appear to be noticed in either of these measurements.
- If updates come more frequently than the logger writes them, the records are merged in a natural way, e.g. stress levels are maxed, status bits are OR-ed.
- The new macro,
EDTV2_SUPPORT, controls inclusion of all relevant code (apart from message declarations, which cannot be easily defined out). By default, the support is enabled when either the board itself, or the IOMCU, support bidirectional DShot, so that potential EDTv2 data can come in. One may forcibly set it to 0, so that the code does not get compiled. Note that in the case of boards with IOMCU, this also requires recompiling the IOMCU firmware, because the DShot telemetry structures are different (to save bandwidth if one does not want EDTv2).
Tests performed
This PR was tested on:
- a quadcopter using the FlywooF405S-AIO controller, running Bluejay on ESCs;
- a coax copter using the MatekH743-WLITE controller, running also Bluejay;
- a test stand using the Pixhawk6C and two Holybro ESCs, all donated by Holybro. One ESC was running BlHeli32, another one AM32, with one motor from the coax copter also connected to the Pixhawk.
Example dataflash logs:
- from FlywooF405S: 24-05-04_01-02-22_newedt.bin.zip
- from the coax: 2024-05-04-newedt.bin.zip
- multiple logs from Pixhawk6C: EDTv2.zip
For Pixhawk, tests have been performed with all combinations of:
- main firmware: Pixhawk6C, Pixhawk6C-bdshot
- IOMCU options:
BRD_IO_DSHOT=0,1 - motors and servos connected to: FMU, IOMCU
In all cases, motors and servos behaved as expected, and EDTv2 logs were written when appropriate. The AM32 ESC had problems starting at lower voltages and in the PWM mode, which I mainly attribute to some exceptionally high magnet cogging of the corresponding motor. The only meaningful logs were produced by the Bluejay-driven motor (instance 3 or 11), which is somewhat expected.
I have no idea why test ccache fails, what this means and whether it is my fault. Could someone maybe help me?
Looking good! In 4.5 the iomcu dshot firmware also now supports EDT so you will need to at a minimum recompile that for this PR and probably wire the new data through AP_IOMCU so that you can get EDT logging from there as well.
@andyp1per I seem to have done what you requested, with an additional change to avoid modifying AP_Logger/LogStructure.h.
I am not sure whether the IOMCU part works though, because I don't have any hardware with IOMCU, so I cannot test the changes. What is more, I see that while the F103 MCU firmware increased in size, the other one did not (it likely decreased because I used a reference instead of a dozen of indexed accesses in one place).
And I still don't understand whether the test ccache test failure is a bad thing - or 75% is something out of thin air, and it is just time to get the safety threshold down to some 70%.
@mbuzdalov thanks - I can test at some point, although it might be easier to try and get hardware - be happy to approve a hardware request for a Pixhawk 6C and Holybro might send you one anyway.
I've put this up for devcall because @IamPete1 and others might have an opinion on the logging
Do we need two log messages?
First, because these two stress values cannot be converted one into another in a vendor-independent way. Second, because stress messages come more frequently, and status messages come roughly at 1Hz, so if one hunts for a particular bad event, stress messages might be more helpful.
First, because these two stress values cannot be converted one into another in a vendor-independent way. Second, because stress messages come more frequently, and status messages come roughly at 1Hz, so if one hunts for a particular bad event, stress messages might be more helpful.
It does depend on the relative rate of the two messages, each message has quite a big overhead. The header, timestamp and instance give 12 bytes before we even get to the data.
In your example log ESC is at 20hz, EDTS around 8 (but quite inconsistent) and EDT2 hardly at all. Message rates from this WebTool. Note that this is the message rate in the log, so a quad logging at 5hz is 20hz message because of the four instances. So in your example it would be a smaller log to include the stress in ESC (1 * 20 vs 13 * 8). The cross over point would be 1/13, so for EDTS at 8hz ESC would have to be above 260Hz.
However I guess since we won't always have the new values using a new message means we only pay that extra cost if the data is there.
More data in the log is always good, so I certainly support this PR but thinking about how we can minimize the cost in log size means we can keep on adding more logging for more things in the future.
@andyp1per added for EU call
I didn't really think about it before, but I guess the "stress" is sort of a duty cycle? I ask because in DroneCAN we have a ESC telem field "power percentage". Annoyingly both definitions are a bit fuzzy and dependent on ESC manufacturer. But I have a commit witch adds logging for that (admittedly on a older branch) https://github.com/IamPete1/ardupilot/commit/88be9bad5efa065f42d97492d59a50c27dbe303a. I just added that to the existing ESC log. Could we share a field for both users?
I guess the "stress" is sort of a duty cycle?
Bluejay understands this as follows (copy from the EDT spec description):
Stress level: Indicates the effort the ESC has to make to keep commutation going. In Blheli_S/Bluejay for example it is related to desync events. When it reaches the maximum indicates that the ESC cannot make reliable commutation. Zero is the ideal value, when ESC commutation is perfect.
Putting aside that currently Bluejay's 8-bit stress value is always >= 120 (the 4-bit one can indeed be zero though), this is possibly similar in spirit to what @IamPete1 says about DroneCAN's "power percentage", but not completely identical.
It can be good to share this kind of data in a single field, I think, but my main concern is about the possible difference in message rates. In the case of EDT: ESC messages are logged at 100 Hz if not filtered (and possibly come more often), "stress" messages come at most at some 30 Hz, and not regularly at all (which is consistent with the spec, as these are logged if the ESC chip can do it without suffering the performance). What shall we log if "stress" is not updated? Putting a zero seems a poor choice, as the plot will be much saw-shaped without any underlying physical reason. Keeping the previous value seems not well-motivated as well. As this is not a floating-point value, we cannot log a NaN.
@andyp1per will organise some testing and discuss logging with @IamPete1
@mbuzdalov please put in a funding request here for a Pixhawk6c - https://discuss.ardupilot.org/c/proposals/125 - it will get approved and make it easier for you to test stuff.
please put in a funding request
@andyp1per here it is, hopefully I am not too far away from the guidelines...
@mbuzdalov how is this going?
Also I have fixed the 30s no notch filtering issue but cannot find your forum post to reply to
I have fixed the 30s no notch filtering issue
Thank you, I will check it today. (UPD: checked, works nicely as expected!)
how is this going?
The initial tests with Pixhawk 6C and EDTv2 over IOMCU went well.
On proceeding with implementing more stuff from this discussion, I diverged very much on how to actually do the thing right, and as a result, did not do much - and I have been pathologically busy recently to do the actual things. For example, if we choose to log the status bits as bits, shall I first contribute a general message to self-describe the bit fields (as the rest of the subsystem does), or not?
Maybe I shall just do the obvious things, with a hope that subsequent logging improvements will make it less ugly at some point.
@andyp1per @peterbarker @IamPete1 here is the new version of the EDTv2 contribution. I have updated the initial description, but TLDR the main differences are:
- one log message instead of two;
- a define that can be used to remove the support;
- bits get logged as a bitmask;
- IOMCU support implemented and tested.
(I think that the failing sitltest-copter-tests2b has nothing to do with the new code)
we need to split the commits
discussed on dev call, needs a different define name, maybe AP_EXTENDED_DSHOT_V2_ENABLED ?
@mbuzdalov I've fixed the commit msgs and the define to follow our code style
@mbuzdalov I've fixed the commit msgs and the define to follow our code style
@tridge Thank you! Maybe, if it's not too late, you fix commit messages once again, so that they don't designate subsystems multiple times?
(like "AP_HAL_ChibiOS: aP_ESC_Telem, AP_IOMCU: add support..." => "AP_HAL_ChibiOS: add support")
@mbuzdalov if you want you can redo the commit messages and re-push - it's not yet been merged
@mbuzdalov I changed the commit messages for you
@mbuzdalov I changed the commit messages for you
Thank you! I saw that strange stuff was happening with CI this morning, so I thought it would be wise to ask.
@andyp1per @tridge this one seems ready to merge finally.
@mbuzdalov great work, thanks!