msp-osd
msp-osd copied to clipboard
Known issue: iNav display artifacts when packets are lost
The issue with iNav is basically this: it sends delta updates instead of element updates or full-screen updates.
Betaflight and Ardupilot send a Clear command, a series of Draw commands (usually, but not always, one for each element), and then a Draw Screen command.
iNav never sends a Clear command - it basically sends scanlines line-by-line (twice per line actually, because of multi-page fonts) with only changed characters from the previous frame. If a character is deleted, it replaces it with 0x20 (space).
This seems smart on the surface, because of course most of the OSD is actually static and it massively reduces the number of bytes sent.
The problem is, it only ever sends "I-frames" so to speak (delta updates), never P frames (full screen frames). So if even a single intermediate frame is lost, that frame's updates break the OSD forever. That's what's happening here. I was going crazy because it's flawless in my bench testing since my implementation is "correct."
But, once you go out in the field, you lose a packet here and there. With Ardu and BF, this doesn't matter at all, because the next packet redraws the whole screen anyway. With iNav, this is fatal for the AHI, because now some 0x20s were missed and some characters from the AHI are stuck on the screen forever unless it is cleared again.
What I will have to do to fix this is have the DisplayPort mux on the AU side reconstruct the character buffer for the screen layout, and then either send full frames again or make my own delta update system which periodically sends full-frame updates and/or has checksums and a retransmit capability.
Or switch to TCP? Have you tested the latency difference? Is it really that noticeable?
@DimitarKrastev it wouldn't fundamentally solve the problem because TCP packets can end up being dropped and you'd still need a recovery mechanism.
I'd have to test TCP more, but honestly since the shape of what we are trying to do is so well defined and latency sensitive, there's good optimization to be gained by just making our own delta update protocol over UDP (or even switching to fixed-refresh fullscreen updates, honestly).
I'd prefer to have a packet get dropped completely and just refresh the screen next packet than to spend a ton of time tuning TCP buffer and retransmit parameters only to have packets arrive late. If a packet is more stale than 100ms or so we really don't want it to begin with, so UDP is a better option overall.
I've also seen nothing but downright bizarre behavior with TCP on the goggles in my testing.
@j005u incorrect. TCP offers automatic resend and order mechanism. If no packets could be delivered the whole connection will drop and a reconnect will be necessary which in theory is OK for us because we can detect that and erase the OSD contents.
@bri3d TCP has a few things that need to be kept in mind. In order to make any sort of sense for latency sensitive communication, TCP_NODELAY needs to be enabled. Maybe that's why you noticed the bizarre behavior?
@DimitarKrastev I know what TCP is :).
The fundamental issue I was reffering to is that even if you detect packet loss, you'd still need the VTX to keep an internal up to date character buffer, to have anything to re-send. This might include many seconds of updates in an RF link lost situation, not just what the TCP susbsys can take care of. Currently the VTX only has the last delta frame at best.
The other concern is the TCP stack weirdness Brian mentioned.
@j005u This would imply also that if you power cycle the goggles after they were connected to the AU, you will also have to power cycle the AU otherwise the OSD will be broken.
@DimitarKrastev That's correct and is exactly how iNav in HDZero mode works today.
I currently instruct users to enter and exit the on-screen configuration menu before takeoff which "resets" the internal state of the iNav DisplayPort.
This is another reason I am just going to reconstruct the character map on the VTx side and send more "P-frame" / full OSD updates rather than delta only, it will let me mostly fix this issue too.
Now @j005u 's point about TCP makes sense. It would kind of sort of fix the issue when there is a intermediate packet loss, but it just delays the inevitable. Once you loose connection and TCP link fails you end up with the same problem once the connection is restored.
This solution seems apropriate when you have local communication only. Sending just deltas would significantly reduce the comunication required to ship the OSD data, but for remote transmission it doesnt cut it.
So full buffer state management at the AU side seems necessary. This is pretty much your idea.
If you go that route you could implement a hybrid approach that could potentially be pretty cool.
How about this twist: From the MSP you could be getting the OSD configuration about which OSD element is configured to appear where on the screen. You could then prioritize certain character positions over others depending on which OSD element they represent. For instance. Battery voltage doesnt need to be updated 10 times a second, but artificial horizon does.
Are the artifacts in the artificial horizon related to this bug?
I have found changing this value in inav source code will make it better, The latency is also reduced src/main/io/displayport_hdzero_osd.c line 57 #define DRAW_FREQ_DENOM 4->8
This should be resolved by the new compress_osd option