ola icon indicating copy to clipboard operation
ola copied to clipboard

UART plugin does not send correct DMX data when granularity check happens during high load

Open markus983 opened this issue 3 years ago • 5 comments

Hi,

I run OLA on a RPI 4 within a docker container and use UART to send out DMX data. Most of the time it works just fine, however, from time to time the DMX protocol appears to break completely. After some testing and investigating, I found that the granularity is set to BAD when the DMX protocol appears to break. Apparently this can happen quite easily, when the RPI has a lot of other stuff to do during the granularity check. My guess would be, that the ola server doesn't get the CPU in time. The code can be found in plugins/uartdmx/UartDmxThread.cpp. As a result, the DMX break signal is skipped which causes the DMX recipient to ignore the DMX data following since it cannot interpret it correctly. The only way I've found to resolve this problem is to restart the entire OLA server.

Recreating this issue should be fairly easy, just force all your CPUs to 100% / stress test your hardware and restart your OLA server with the UART DMX plugin enabled. Maybe you'll have to restart the server a few times until it happens since this issue is not deterministic. It occurred with the versions 0.10.3 and 0.10.8, I haven't tested the others, but I don't expect different results.

I think it would be cleaner to fix this issue by fixing the granularity check, or by changing the impact the granularity has on the data sent. However, I'm not sure how to fix this since I have no experience in coding so close to the hardware. But for the time being, could you perhaps provide a config parameter that allows to force the granularity to GOOD?

I know that this is basically undermining it's entire purpose, but I also know that the hardware can handle the timings and it isn't an option for me to basically depend on luck that this plugin is sending out DMX data correctly.

Thanks for your help!

markus983 avatar Nov 21 '22 09:11 markus983

Hi @markus983 ,

I run OLA on a RPI 4 within a docker container

To try and address the simple issues, have you tried running OLAd natively or using nice to give it a higher priority?

and use UART to send out DMX data. Most of the time it works just fine, however, from time to time the DMX protocol appears to break completely. After some testing and investigating, I found that the granularity is set to BAD when the DMX protocol appears to break. Apparently this can happen quite easily, when the RPI has a lot of other stuff to do during the granularity check. My guess would be, that the ola server doesn't get the CPU in time. The code can be found in plugins/uartdmx/UartDmxThread.cpp. As a result, the DMX break signal is skipped which causes the DMX recipient to ignore the DMX data following since it cannot interpret it correctly.

From a quick look at the code, it does indeed appear that the break and MAB get skipped if we're not in good granularity, but the frame data does still get output, which seems a slightly odd choice (this is also true with the FTDI plugin): https://github.com/OpenLightingProject/ola/blob/966c85b5e2177c381505a5f7904733a3b6fe9eeb/plugins/uartdmx/UartDmxThread.cpp#L96-L106

The only way I've found to resolve this problem is to restart the entire OLA server.

FWIW I suspect reloading the plugins would probably resolve it too.

But for the time being, could you perhaps provide a config parameter that allows to force the granularity to GOOD?

That feels like a huge hack and will just generate a pile of new issues!

However there was a change made to the FTDI plugin some time ago, which allows granularity to be recovered if timing improves, would you like to try applying the same changes to the UART plugin and see if that fixes your issues? https://github.com/OpenLightingProject/ola/commit/01835aaecb90087a0d7b480eb2fde87a38938734

peternewman avatar Nov 21 '22 14:11 peternewman

Hi @peternewman ,

thanks for your help!

From a quick look at the code, it does indeed appear that the break and MAB get skipped if we're not in good granularity, but the frame data does still get output, which seems a slightly odd choice (this is also true with the FTDI plugin)

You're right about that, I was surprised as well since I expected the actual data to be the most important/sensitive part regarding the granularity.

However there was a change made to the FTDI plugin some time ago, which allows granularity to be recovered if timing improves, would you like to try applying the same changes to the UART plugin and see if that fixes your issues? https://github.com/OpenLightingProject/ola/commit/01835aaecb90087a0d7b480eb2fde87a38938734

Thanks about that info, I've changed the code of the plugin accordingly and this actually fixes the problems. Even high load scenarios are no issue with these changes. I opened a pull request (#1800) hoping others can benefit from these changes as well.

markus983 avatar Dec 02 '22 10:12 markus983

thanks for your help!

No worries, thanks for testing it out and opening a PR!

From a quick look at the code, it does indeed appear that the break and MAB get skipped if we're not in good granularity, but the frame data does still get output, which seems a slightly odd choice (this is also true with the FTDI plugin)

You're right about that, I was surprised as well since I expected the actual data to be the most important/sensitive part regarding the granularity.

Thinking about it a bit more, I guess the theory is as follows: Break MAB First frame output (1,2,3...512) Timing granularity goes No Break No MAB Second frame output continues (1,2,3...512)

So what the fixture sees is: Break MAB 1,2,3...512,1,2,3...512

A properly behaved fixture should see that as one very big frame and just pick up the values towards the start of the frame that it needs. Without double-checking I'm not sure if that would count as a loss of DMX or not, but I guess hopefully it would stop it putting out garbage anyway. Or at least I assume that's the idea.

Thanks about that info, I've changed the code of the plugin accordingly and this actually fixes the problems. Even high load scenarios are no issue with these changes. I opened a pull request (#1800) hoping others can benefit from these changes as well.

Excellent thanks. So do you see the failed granularity message during high load and then it recovers afterwards?

peternewman avatar Dec 04 '22 10:12 peternewman

Excellent thanks. So do you see the failed granularity message during high load and then it recovers afterwards?

Yes, with these changes the plugin was always able to recover even during high load. Actually, the recovery happened almost immediately after the initial check.

markus983 avatar Dec 05 '22 09:12 markus983

Yes, with these changes the plugin was always able to recover even during high load. Actually, the recovery happened almost immediately after the initial check.

That's great news. I guess it probably just drops one or two frames in reality and that's it!

peternewman avatar Dec 05 '22 12:12 peternewman