msp-osd icon indicating copy to clipboard operation
msp-osd copied to clipboard

Known issue: iNav display artifacts when packets are lost

Open bri3d opened this issue 1 year ago • 8 comments

The issue with iNav is basically this: it sends delta updates instead of element updates or full-screen updates.

Betaflight and Ardupilot send a Clear command, a series of Draw commands (usually, but not always, one for each element), and then a Draw Screen command.

iNav never sends a Clear command - it basically sends scanlines line-by-line (twice per line actually, because of multi-page fonts) with only changed characters from the previous frame. If a character is deleted, it replaces it with 0x20 (space).

This seems smart on the surface, because of course most of the OSD is actually static and it massively reduces the number of bytes sent.

The problem is, it only ever sends "I-frames" so to speak (delta updates), never P frames (full screen frames). So if even a single intermediate frame is lost, that frame's updates break the OSD forever. That's what's happening here. I was going crazy because it's flawless in my bench testing since my implementation is "correct."

But, once you go out in the field, you lose a packet here and there. With Ardu and BF, this doesn't matter at all, because the next packet redraws the whole screen anyway. With iNav, this is fatal for the AHI, because now some 0x20s were missed and some characters from the AHI are stuck on the screen forever unless it is cleared again.

What I will have to do to fix this is have the DisplayPort mux on the AU side reconstruct the character buffer for the screen layout, and then either send full frames again or make my own delta update system which periodically sends full-frame updates and/or has checksums and a retransmit capability.

bri3d avatar Aug 29 '22 18:08 bri3d

Or switch to TCP? Have you tested the latency difference? Is it really that noticeable?

DimitarKrastev avatar Aug 29 '22 21:08 DimitarKrastev

@DimitarKrastev it wouldn't fundamentally solve the problem because TCP packets can end up being dropped and you'd still need a recovery mechanism.

j005u avatar Aug 29 '22 21:08 j005u

I'd have to test TCP more, but honestly since the shape of what we are trying to do is so well defined and latency sensitive, there's good optimization to be gained by just making our own delta update protocol over UDP (or even switching to fixed-refresh fullscreen updates, honestly).

I'd prefer to have a packet get dropped completely and just refresh the screen next packet than to spend a ton of time tuning TCP buffer and retransmit parameters only to have packets arrive late. If a packet is more stale than 100ms or so we really don't want it to begin with, so UDP is a better option overall.

I've also seen nothing but downright bizarre behavior with TCP on the goggles in my testing.

bri3d avatar Aug 29 '22 21:08 bri3d

@j005u incorrect. TCP offers automatic resend and order mechanism. If no packets could be delivered the whole connection will drop and a reconnect will be necessary which in theory is OK for us because we can detect that and erase the OSD contents.

@bri3d TCP has a few things that need to be kept in mind. In order to make any sort of sense for latency sensitive communication, TCP_NODELAY needs to be enabled. Maybe that's why you noticed the bizarre behavior?

DimitarKrastev avatar Aug 30 '22 07:08 DimitarKrastev

@DimitarKrastev I know what TCP is :).

The fundamental issue I was reffering to is that even if you detect packet loss, you'd still need the VTX to keep an internal up to date character buffer, to have anything to re-send. This might include many seconds of updates in an RF link lost situation, not just what the TCP susbsys can take care of. Currently the VTX only has the last delta frame at best.

The other concern is the TCP stack weirdness Brian mentioned.

j005u avatar Aug 30 '22 10:08 j005u

@j005u This would imply also that if you power cycle the goggles after they were connected to the AU, you will also have to power cycle the AU otherwise the OSD will be broken.

DimitarKrastev avatar Aug 30 '22 16:08 DimitarKrastev

@DimitarKrastev That's correct and is exactly how iNav in HDZero mode works today.

I currently instruct users to enter and exit the on-screen configuration menu before takeoff which "resets" the internal state of the iNav DisplayPort.

This is another reason I am just going to reconstruct the character map on the VTx side and send more "P-frame" / full OSD updates rather than delta only, it will let me mostly fix this issue too.

bri3d avatar Aug 30 '22 17:08 bri3d

Now @j005u 's point about TCP makes sense. It would kind of sort of fix the issue when there is a intermediate packet loss, but it just delays the inevitable. Once you loose connection and TCP link fails you end up with the same problem once the connection is restored.

This solution seems apropriate when you have local communication only. Sending just deltas would significantly reduce the comunication required to ship the OSD data, but for remote transmission it doesnt cut it.

So full buffer state management at the AU side seems necessary. This is pretty much your idea.

If you go that route you could implement a hybrid approach that could potentially be pretty cool.

How about this twist: From the MSP you could be getting the OSD configuration about which OSD element is configured to appear where on the screen. You could then prioritize certain character positions over others depending on which OSD element they represent. For instance. Battery voltage doesnt need to be updated 10 times a second, but artificial horizon does.

DimitarKrastev avatar Aug 30 '22 20:08 DimitarKrastev

Are the artifacts in the artificial horizon related to this bug?

PXL_20220922_160715344 TS_exported_25035_1663908308335 PXL_20220922_160715344 TS_exported_16035_1663908301089

noemu avatar Sep 23 '22 04:09 noemu

I have found changing this value in inav source code will make it better, The latency is also reduced src/main/io/displayport_hdzero_osd.c line 57 #define DRAW_FREQ_DENOM 4->8

shota3527 avatar Oct 01 '22 06:10 shota3527

This should be resolved by the new compress_osd option

bri3d avatar Oct 16 '22 18:10 bri3d