SMACC icon indicating copy to clipboard operation
SMACC copied to clipboard

State machine crashes on state transitions

Open Mr-Niels opened this issue 2 years ago • 3 comments

This seems to be caused by state reactors and client behaviours that are destructed while their callback functions are running.

Is there a way to prevent such scenarios? For custom objects it is possible to use mutexes to circumvent the issue, but for standard objects such as 'sr_all_events_go' this is less than ideal.

Mr-Niels avatar Mar 02 '22 10:03 Mr-Niels

We ran into a similar issue. We've had some success using a combination of mutex locks in the CBs' onExit methods to prevent callbacks from running during destruction and using SMACC's built-in asynchronous CBs that works better with long-running onExit calls

yassiezar avatar Apr 18 '22 20:04 yassiezar

In SMACC and SMACC2 the lifetime of the objects is something we really tried to pay some special attention. In general SMACC and SMACC2 should be robust to any state leaving. We have long-life objects such as clients, orthogonas or components, and short-term-life such as states, state reactors and client behaviors. When one of the short-life objects die should not produce any exception.

In general short-life objects reference and use long-life objects.

For example, we use the concept of SmaccSignal to really make sure the disconnection happen when some of the objects life finishes. I.e. Lets say a clientCehavior has a signal callback of a client or component. If the signal happens when the client behavior is already dead, then SMACC is (should be) smart enough to know that that signal should be skipped.

In general mutexes is not a way to go. The only usage of mutexes we recommend to protect clients or components that may be used by several asynchronous client behaviors simultaneously in different threads. Nonetheless, it would be nice to see the case.

I am aware that some progress was done in SMACC2 that fixed some issues of SMACC1. But the easiest way to start here is sharing some specific code example that really isolates and shows the problem. Then we can handle that.

Nonetheless, at some point we will try to find it and fix it if that exist.

pabloinigoblasco avatar Apr 27 '22 17:04 pabloinigoblasco

@pabloinigoblasco that's a fair point. I tried to make a toy example to see if I can replicate the behaviour, but was unable to since the signals you described seemed to be doing their job preventing the short-lived objects from being dereferenced too early. I'll need to think more on the problem, but I'll try to make a proper sample to recreate the issue for you.

For some additional context on my thinking: in our case, the crashes (always segfaults referencing non-existing objects) were non-deterministic and happened randomly, but always upon a user-triggered state transition and involved long-running CUDA image processing tasks. This is what made me suspect a race condition being somewhere and trying mutexes to prevent premature destruction of the associated CBs. This seems to have helped, but I agree its not a very elegant solution and doesn't quite fit into the SMACC style.

yassiezar avatar May 05 '22 09:05 yassiezar