SMACC icon indicating copy to clipboard operation
SMACC copied to clipboard

Segfault when transitioning from state with a high frequency callback client behaviour

Open yassiezar opened this issue 1 year ago • 8 comments

I've come across what might be a bug.

In my system, I have a CB subscribed to a high-frequency teleop topic with the standard SMACC signal system, which then publishes messages onto another topic via another SMACC client. However, I've experienced segfaults occurring during state transitions when publishing a message from within the subscriber CB callback. GDB confirmed my suspicion that I'm referencing a nullptr when trying to publish a message via the client pointer. I suspect the CB's onExit() has been called and all the class members have been deallocated before the client pointer is dereferenced and the message is published, triggering a segfault.

The event timeline looks something like this:

- topic data published  -> onMessageReceived() triggered -> dereference class pointer to client -> SEGFAULT
-        transition triggered -> call onExit() -> destroy CB

Does this make sense? I've added an example node here that hopefully make things clearer. The issue seems to be that there's no way to tell when a pointer to a client has been deallocated inside a CB callback, except for maybe doing a check for nullptr before attempting to deallocate e.g.

if(cl_pub_ == nullptr) return;
cl_pub_->publish(msg)

but this is effectively a race condition (the pointer can still be deallocated between these instructions). This issue only seems to become apparent with high frequency callback triggers - lower frequency CB's manage to destroy themselves more predictably

The obvious solution to me would be to replace the naked pointers with std::shared_ptr<TClient> instead and return std::weak_ptr<TClient> to the CBs via the requiresClient() calls. On other words, a typical CB would go from

class CbExample : public smacc::ISmaccClientBehavior {
  public:
    void onEntry() override {
      if(!this->requiresClient(cl_pub_)) {
        ROS_ERROR("Client doesn't exist");
        return;
      }
      std_msgs::Empty msg;
      cl_pub_->publish(msg);
    }
  private:
    ClPub* cl_pub_;
};

to

class CbExample : public smacc::ISmaccClientBehavior {
  public:
    void onEntry() override {
      cl_pub_ = this->requiresClient();
      if(auto lock = cl_pub_.lock()) {
        std_msgs::Empty msg;
        lock->publish(msg);
      }
      else {
        ROS_WARN("ClPub went out of scope");
      }
    }
  private:
    std::weak_ptr<ClPub> cl_pub_;
};

In my opinion, this has significant safety benefits and will prevent silly errors like the one I described above. I'm happy to look at this and open a PR. However, this refactor will need significant changes to SMACC's API. Thoughts?

┆Issue is synchronized with this Jira Task by Unito

yassiezar avatar Feb 14 '23 16:02 yassiezar