machinekit-hal Failure state management for remote HAL components

Issue by f1oat Sat Feb 24 18:07:17 2018 Originally opened as https://github.com/machinekit/machinekit/issues/1357

Per discussion with Alexander, here is a suggestion for HAL enhancement.

When using HAL remote components, in case of communication failure, with current implementation the outputs are staying at the last updated values. That could be dangerous for the machine, for example if a remote component is used to drive an actuator. This is typically the case for jog buttons running on a tablet.

On commercial remote I/O modules, you can define a failure state for each output. This state is automatically set when any failure is occurring (such as modbus link failure).

We can imagine the same feature for HAL components. With a new HAL command, defining a failure state value for each output :

setf [false|true|0|1|none]

'none' meaning legacy behavior

The failure state should be set when HAL communication failure is detected (need polling ?).

Also, that will be nice if any HAL remote component can have by default a pin giving the current sanity state of the communication. It could be used to define more complex fail-safe scenario.

Aug 03 '18 15:08 ArcEye

Comment by luminize Sat Feb 24 19:30:58 2018

I think this behaviour sound logical.

From my perspective it makes sense to put a solution for this in the Haltalk/Machinetalk/remote HAL component code. Not in general HAL pin behaviour, because you can have HAL without remote components, and then a failure state would not seem logical and be overhead too.

How about exposing a pin on the remote component, which exposes a communication failure state, and that pin you can use for "AND"ing pins/signals which need this fail safe state. So use HAL components which are already available and only detect and expose communication faillure?

Aug 03 '18 15:08 ArcEye

Comment by luminize Sat Feb 24 19:34:48 2018

on a side note: You cannot and should not use this for Safety. Safety related stuff needs to be outside the CPU logic, like with appropriate hardware emergency stop when required. However, i think this will definitely have it's use in making the risk of damage smaller.

Aug 03 '18 15:08 ArcEye

Comment by f1oat Sat Feb 24 19:59:06 2018

Yes, exposing communication failure state is what I mean at the end of my post. But it could be painful to "wire" if you have many outputs for which you want a simple generic behavior such as falling in 0 state. I discovered the equivalent of "setf" in Advantys industrial I/O modules and found it quite convenient. About safety, I agree, that should be done with independent circuit.

Aug 03 '18 15:08 ArcEye

Comment by luminize Sun Feb 25 10:23:52 2018

This needs some thinking, because how does this work when you're running your machine from a "normal" setup, like on the host itself, and you also have a remote UI. I can imagine that your machine won't work when you're running both. Does one "override the communication faillure" when working locally as well as remote. That gets pretty obscure very quickly.

From my perspective, the HAL component falling back to a safe state should be principally done by the component design itself, for example losing the enable bit as a result of an estop chain should take care of the output pins. If these "safe" states behaviour is made a property of the pin, then the water will/can become very murky.

so as far as "expose communication faillure state of remote component" I fully agree, but for having this safe state option on a per pin basis I don't think that's the correct solution.

Aug 03 '18 15:08 ArcEye

Comment by ArcEye Sun Feb 25 10:39:20 2018

We certainly don't want to mess around with normal components.

If this is a problem with a comm's failure when using remote components, why not have a watchdog component which triggers if comm's cease. It could have multiple preset outputs, attached to whatever you require. The number of actual pins required to be set to a default, would probably not be high, just a few important ones.

Better still, it could act as a filter, the relevant outputs pass through it unchanged until comm's failure, when the output values are switched to a preset default. No need to alter the actual components, just change the output value reaching the linked signal.

You could tie it into the Estop chain too.

Aug 03 '18 15:08 ArcEye

Comment by f1oat Sun Feb 25 11:15:32 2018

Yes, not so easy. I missed the fact we can have multiple remote connections to the same component!

I am probably mixing 3 very different concepts: 1/ "Advantys like" remote I/O module fail-safe state, which is directly linked to modbus sanity: in that case, communication failure is defined as polling timeout 2/ Specific issue I have with MK for jog buttons in Machineface: in case of sporadic LAN latency, when you release the jog button, the motion does not stop immediately 3/ Generic solution for HAL remote components communication failure

For 2/, the idea is to achieve reliability comparable to hard wired button. May be the best solution will be to implement a custom watchdog with fast polling between the BBB and the tablet. This watchdog to be really used only when the button is pressed. In case of multiple GUI, the conflict may be solved with simple logic. One solution is to have a 200ms monostable logic to drive the jog signal, and have the GUI sending a message every 100ms when the jog button is pressed.

My feeling is that we are shifting to already solved industrial grade ethernet problems! Need to read more on that topic.

Aug 03 '18 15:08 ArcEye

Comment by ArcEye Sun Feb 25 11:59:45 2018

I won't use a wireless pendant, because of the potential RF interference from VFD's or 'dirty' DC motors. Mine are hard wired to pins, you can even get lag using a USB connection.

So you can guess that I don't use remote interfaces on my phone or any such stuff. :wink:

Aug 03 '18 15:08 ArcEye

Comment by dkhughes Mon Feb 26 16:49:59 2018

I use the remote interfaces exclusively with the small arm SoCs. I think the easiest way of fixing this is to make an independent HAL component that implements the watchdog. That way you can use it if you need it on pins that are important.

For the jog specific issue, I exposed an extra hal pin that feeds a watchdog component. The gui increments this tick value during a jog command, and if it times out the amplifier enables are disabled. That might be too strict for your application, but for mine we have wired ethernet dedicated to communicating between UI and control board over hal remote components.

If there is a local UI and a remote UI, the component could be configured as a master/master implementation. Whichever interface asserts it's watchdog pin becomes the active master and it is monitored for failure during motion.

Just a thought.

Aug 03 '18 15:08 ArcEye

Comment by machinekoder Fri Mar 2 09:48:04 2018

Exposing the number of remote connections is, in general, a good idea. The only downside is this will not work for pure multicast connections (1 to many, useful for a DRO for example).

What you can do in any case is to create your own HAL watchdog logic. For single step jogs it's pretty simple, just wiggle the increment pin and add single shot component.

For continuous jog, you can either use counting or a simple toggle pin with "watchdog" attached on the HAL side. Remember that the delay for wireless connection can be up to a few hundred ms.

@ArcEye USB is in general, not very robust when it comes to noise and rough environments. Wireless or Ethernet connections might give you even better results than USB because they are designed for bad conditions (error handling), especially WiFi works even under very noisy conditions (but the delay might be huge).

Aug 03 '18 15:08 ArcEye

machinekit-hal machinekit-hal copied to clipboard

Failure state management for remote HAL components

machinekit-hal
machinekit-hal copied to clipboard