asyn icon indicating copy to clipboard operation
asyn copied to clipboard

Destructible ports

Open exzombie opened this issue 2 years ago • 37 comments
trafficstars

This PR extends asynManager to allow a port to be destructible, solving #172. It complements epics-base/epics-base#283.

Goals

  • Provide a function in asynManager and a corresponding iocsh command that shuts down a port.
  • Shut down ports on IOC exit.
  • The port cannot be re-enabled.
  • The low-level driver is given a chance to release resources, or outright destroy itself.
  • Access to the low-level driver is prevented.
  • The asynPortDriver base class uses the new functionality.
  • All of this is opt-in so that existing drivers continue to work as they always did.

Port shutdown

There is a new function, asynManager::shutdown(), which

  • disables the port, short-circuiting asynManager::queueRequest() and asynManager::lockPort(), thus preventing holders of asynUser dangling references from using them;
  • nullifies all driver pointers that are reachable from the port, none of which should be reachable from user code thanks to the previous point;
  • marks the port as defunct, which prevents re-enabling it;
  • triggers a new exception, asynExceptionShutdown, which is to be handled by a conforming driver.

There is a new iocshell command that shuts down a port, useful mostly for testing.

This functionality is opt-in. A conforming driver declares that it is destructible by passing a new attribute, ASYN_DESTRUCTIBLE. In such a case, asynManager will register an epicsAtExit hook to call shutdown() on IOC exit. A conforming driver has to handle asynExceptionShutdown to perform cleanup, and may invalidate itself completely by calling delete.

asynPortDriver and inheritance

asynPortDriver uses the new functionality by

  • passing through ASYN_DESTRUCTIBLE if it is given to it via the asynFlags parameter;
  • adding an exception handler that deletes the driver, relying on the destructor for cleanup;
  • adding a new virtual function, asynPortDriver::shutdown(), which allows things not possible in a destructor;
  • calling pasynManager::shutdown() from its destructor (while preventing recursion) so that calling delete on the driver also disables the port.

Because this functionality is opt-in, there's a question of backwards compatibility in an inheritance tree, such as those in areaDetector. This can be solved by prescribing that a driver may only receive the ASYN_DESTRUCTIBLE flag and act upon it, but may not force it. This flag should come in via the iocshell command that instantiates a driver. This way, the driver will be destructible only if the most derived class is instantiated with this flag.

Limitations

  • It is not possible to release all memory because asyn was not designed in a way to allow that. A port and all the structures describing it will continue to exist indefinitely, as will those asynUser instances that the users don't destroy. But, as stated in #172, memory is not the resource that anyone is concerned about on IOC shutdown.
  • Deleting a subclass of asynPortDriver directly, without calling asynManager::shutdown(), can work, but only if the most derived class calls asynPortDriver::shutdown() from its destructor. This is encouraged by the documentation but not enforced. It only affects unit tests that want to delete drivers, to my knowledge.

TODO

  • [x] documentation
  • [x] tests

exzombie avatar Mar 08 '23 18:03 exzombie

:x: Build asyn 1.0.161 failed (commit https://github.com/epics-modules/asyn/commit/a796d33d75 by @exzombie)

AppVeyorBot avatar Mar 09 '23 02:03 AppVeyorBot

I've simplified things a bit since yesterday, and did a smoke test with areaDetector. I noticed that its plugins already have working destructors, so I just forced the NDPosPlugin to declare itself as destructible. valgrind reported no new issues. The base class destructor in asynPortDriver ensures that the port is marked as defunct, so even if a driver is deleted directly, everything should be fine.

@MarkRivers, where do I update the documentation? I noticed that you just added a whole new subdirectory.

exzombie avatar Mar 09 '23 17:03 exzombie

:x: Build asyn 1.0.176 failed (commit https://github.com/epics-modules/asyn/commit/f56e71eb33 by @exzombie)

AppVeyorBot avatar Mar 09 '23 23:03 AppVeyorBot

:white_check_mark: Build asyn 1.0.177 completed (commit https://github.com/epics-modules/asyn/commit/35add117e6 by @exzombie)

AppVeyorBot avatar Mar 10 '23 04:03 AppVeyorBot

where do I update the documentation? I noticed that you just added a whole new subdirectory.

The documentation is now in docs/source/.rst. I have deleted all of the old documentation/.html files that have been converted to .rst. There are still a few small files left to do.

There is a Github Actions that publishes the docs to https://epics-modules.github.io/asyn

MarkRivers avatar Mar 10 '23 15:03 MarkRivers

:white_check_mark: Build asyn 1.0.196 completed (commit https://github.com/epics-modules/asyn/commit/921b11a85d by @exzombie)

AppVeyorBot avatar Mar 10 '23 20:03 AppVeyorBot

I've rebased, wrote the documentation, and extended the example driver. While writing the docs, I noticed that asynManager::lockPort doesn't go through the request queue as I had thought. Now, lockPort also returns an error.

I also reintroduced the asynPortDriver::shutdown() function. I had removed it because it seemed to me that things would be simpler without it, and I didn't see a good use case. However, I have taken a close look at some of the scars I got in the past when dealing with a design that relied on calling overridden virtual functions from a base class destructor. Not a good idea.

exzombie avatar Mar 15 '23 20:03 exzombie

:white_check_mark: Build asyn 1.0.214 completed (commit https://github.com/epics-modules/asyn/commit/33edcb38c1 by @exzombie)

AppVeyorBot avatar Mar 16 '23 07:03 AppVeyorBot

:white_check_mark: Build asyn 1.0.215 completed (commit https://github.com/epics-modules/asyn/commit/b66b9a3621 by @exzombie)

AppVeyorBot avatar Mar 16 '23 12:03 AppVeyorBot

Rather than having every driver create an exiting_ flag and set it in the destructor, would it be nicer to provide an asynManager method pasynManager->isExiting(), and/or an asynPortDriver method isExiting()?

Ah, you're referring to the changes I made to the example driver. Well, having an exit flag and having threads act on it is a standard pattern, so it might indeed make sense to provide this flag. Maybe. As I see it, there's two aspects to consider: how such a mechanism would affect the shutdown, and how to provide it in the first place.

Let's suppose there exists an isExiting() function, and that all subclasses use it. Meaning, the threads spawned by each subclass consult this function to decide when to return. And when I say "subclasses", I mean a linear inheritance chain starting with asynPortDriver. It is possible for each subclass to create as many background threads as it wants.

First, setting an "exit" flag may not be enough on its own. Often, the destructor (or shutdown()) has to wake sleeping threads. This is the case for, say, the example driver. And then, the destructor needs to wait for the thread to finish, preferably by joining it. In this context, having isExiting() provided by asynPortDriver base class instead of the subclass is only a minor improvement, I think.

Second, the flag used by isExiting() is set once, and seen by all the threads of all subclasses in parallel. I'm not sure whether this is fine or not. I can imagine a subclass wanting to do a thing or two in shutdown() while its base class is operating normally. This is not possible if isExiting() stops all threads at once.

Which brings us to the question: who sets the exiting flag? I had written a long-ish explanation of how things could work and why most of it is not a good idea, but I scrapped the text because, well, most if it is not a good idea. So let me just say that the only way I see this working well is if shutdown is only allowed to be triggered by asynManager, and a driver may never be deleted directly. Which would mean that all the tests need to be fixed. Luckily, it could be enforced, softly, by printing an error message in asynPortDriver destructor if it executed while the port was still alive.

Is it worth having all threads halt at the same time, and not being able to directly delete drivers, just to avoid having one boolean per subclass?

exzombie avatar Mar 17 '23 13:03 exzombie

For this PR I think it would be good to create an Issue that contains the following:

  • The "problems" that this PR is intended to address. For example:
    • IOCs crash when exiting?
    • IOCs hang when exiting?
    • IOCs leak resources when exiting?
    • Inability to delete a driver and leave the IOC running?
    • Others?
  • The goals of this PR in addressing these problems
  • Limitations
    • For example asynManager cannot delete ports so some resource leak is unavoidable

MarkRivers avatar Mar 17 '23 16:03 MarkRivers

Sure, I can add some background. But why open an issue? This stuff belongs to the cover letter of the PR, no? Especially the goals and limitations.

exzombie avatar Mar 17 '23 16:03 exzombie

General advice on issues vs. PRs:

  • issues: describe some situation that needs attention, comments supply supporting details
  • PR: a resolution to the issue and discussion about the implementation

The Why? question is the most expensive question that we encounter. An issue provides a direct explanation why a specific software change was initiated.

prjemian avatar Mar 17 '23 17:03 prjemian

Issues describe why, PRs describe how.

prjemian avatar Mar 17 '23 17:03 prjemian

In other words, the "problems" go into the issue, and the "goals" and "limitations" go into the description above. Clear enough. The reason I asked is that, personally, I have no problem if the problem itself is described in the PR, with no corresponding issue ticket. And I was just wondering if you want a separate issue for some functional reasons, or if it was just aesthetics. In any case, I'll open the issue with the "problem", no problem :)

exzombie avatar Mar 17 '23 17:03 exzombie

And I was just wondering if you want a separate issue for some functional reasons, or if it was just aesthetics.

Aesthetics, along the lines of what @prjemian said.

MarkRivers avatar Mar 17 '23 17:03 MarkRivers

@ericonr Thank you very much for the review!

exzombie avatar Apr 18 '23 11:04 exzombie

:white_check_mark: Build asyn 1.0.222 completed (commit https://github.com/epics-modules/asyn/commit/39e3cc2b48 by @exzombie)

AppVeyorBot avatar Apr 18 '23 16:04 AppVeyorBot

@anjohnson, @mdavidsaver, and @ralphlange could you please look at this? I would like to merge into the next asyn release if you are good with it. Thanks.

MarkRivers avatar May 24 '23 17:05 MarkRivers

Looks good and appropriate to me.

ralphlange avatar May 25 '23 16:05 ralphlange

What is the story for concurrent access to the new defunct and shutdownNeeded flags?

I have almost convinced myself that all changes are made under the port lock, but I think some reads are not guarded.

eg. In lockPort(), the defunct flag is tested before any locking.

Maybe it should be tested after locking synchronousLock?

eg. The added doc. comment for asynPortDriver shows a sub-class destructor calling needsShutdown() which does no locking. Though it isn't clear to me if this test is actually needed vs. unconditionally calling asynPortDriver::shutdown().

mdavidsaver avatar Jun 08 '23 22:06 mdavidsaver

eg. In lockPort(), the defunct flag is tested before any locking. Maybe it should be tested after locking synchronousLock?

You're right, will fix.

eg. The added doc. comment for asynPortDriver shows a sub-class destructor calling needsShutdown() which does no locking. Though it isn't clear to me if this test is actually needed vs. unconditionally calling asynPortDriver::shutdown().

Excellent point. The answer is "yes, it's needed". However, the fact that it wasn't clear to you, and that I myself got a bit confused when trying to explain how shutdown() can be called from different places, means that subclass authors will get this wrong. Thanks for pointing this out. At the very least, I need to update the documentation.

I was hesitant to allow asynPortDriver to be deleted manually (like, in tests) exactly because it causes complications. I will think about this aspect again. However, having three mutexes involved gives me a headache, so it might take a while.

exzombie avatar Jun 14 '23 09:06 exzombie

:x: Build asyn 1.0.237 failed (commit https://github.com/epics-modules/asyn/commit/3d818ba8f2 by @exzombie)

AppVeyorBot avatar Jun 14 '23 16:06 AppVeyorBot

Some commit messages have typos:

Prettify the definition if iocsh usage messages Change comment style in asynManger.c for consistency

I will try and test this soon, though I don't have as much of a need for port destruction anymore.

ericonr avatar Jun 14 '23 17:06 ericonr

... However, having three mutexes involved gives me a headache, so it might take a while.

Not a request, only a thought. I've taken to using comments within struct/class definitions to document member variable locking/access rules. eg. // the following guarded by 'mymutex' or // const after constructor or // only access from worker thread. This helps me to remember my intent. And encourages me to have an intent! Whether this intent gets correctly expressed in code is another question.

eg. https://github.com/mdavidsaver/pvxs/blob/8ea613cb078579076244c654a1677eff6e143c30/src/clientmon.cpp#L64-L74

mdavidsaver avatar Jun 14 '23 19:06 mdavidsaver

:white_check_mark: Build asyn 1.0.237 completed (commit https://github.com/epics-modules/asyn/commit/b106f3e813 by @exzombie)

AppVeyorBot avatar Jun 14 '23 21:06 AppVeyorBot

I've thought about how to make things more straightforward for users, and decided that it is simpler to just forbid deleting a destructible driver without calling shutdown() first. If the user does that, they will get an error message; the port will still be marked as defunct so most problems are avoided. I think that whoever marks a port as destructible is perfectly capable of calling shutdown() in their test code, while on the other hand, ensuring all the classes in a hierarchy call shutdown() properly when destroyed is less straightforward.

I also added a note explaining that the shutdown() function is called with the driver unlocked. This function has to do it's own locking and unlocking because that's the only way to ensure that threads can make progress and eventually terminate. The shutdownNeeded flag is now atomic, so that it doesn't matter whether the user remembers to lock the driver before calling needsShutdown().

The "template" given in the doxygen comment contained a hint that the ASYN_DESTRUCTIBLE flag should be forwarded in the constructors. I removed that, it's a transitional thing. The intent was to provide a guideline for gradually porting a class hierarchy such as the one in areaDetector, but I think it's better if we keep the asyn docs clean and find agreement on how to do that separately.

This stuff is in 366f154ee74f94bfbdc99fd84d9c2a52db45d8be.

exzombie avatar Jun 21 '23 13:06 exzombie

:x: Build asyn 1.0.239 failed (commit https://github.com/epics-modules/asyn/commit/925271f8ee by @exzombie)

AppVeyorBot avatar Jun 21 '23 13:06 AppVeyorBot

I've thought about how to make things more straightforward for users, and decided that it is simpler to just forbid deleting a destructible driver without calling shutdown() first. If the user does that, they will get an error message; the port will still be marked as defunct so most problems are avoided. I think that whoever marks a port as destructible is perfectly capable of calling shutdown() in their test code, while on the other hand, ensuring all the classes in a hierarchy call shutdown() properly when destroyed is less straightforward.

I don't know if I agree with this, because it makes it harder to define a constructor for a class which inherits from asynPortDriver and that can have an exception in the middle of it and simply allow that exception to propagate so the port is automatically destroyed during construction (basically, I'd like to have fallible constructors).

That said, the current code doesn't seem to work for this purpose yet. I've tested a few scenarios in my constructor, which calls asynPortDriver::asynPortDriver() with ASYN_DESTRUCTIBLE.

  • call this->shutdown() in constructor and return. Trying to load a record which uses that port results in a segfault
drvUserCreate (drvPvt=0x0, pasynUser=0xbdded8, drvInfo=0xb1fc20 "NAME", pptypeName=0x0, psize=0x0) at ../../asyn/asynPortDriver/asynPortDriver.cpp:3392        
3392        pPvt->lock();
  • call this->shutdown() and then throw an exception. Trying to load the record still results in a segfault.
  • simply throw the exception. asynPortDriver::~asynPortDriver runs and prints its warning message, and we still get a segfault when trying to load the record.

The issue seems to be that the mechanism was implemented considering first destroying ports after iocInit, I'd assume?

ericonr avatar Jul 19 '23 19:07 ericonr

Ah, you are quite correct, I had not considered this use case, my priority is IOC shutdown. That said, I can definitely see the appeal of delegating handling a dysfunctional driver to asyn, instead of having to carefully return errors from a partially-initialized driver. Thanks for pointing it out; it's in general inevitable that someone finds a use case that wasn't considered, and I'm glad you did it before the change was merged.

A user of the port needs to take several steps in order to obtain a useful handle, and I'll need to think about when it's best to return an error. For example, it may well be that connectDevice() is too early as it would neuter a number of iocsh commands. I'll take a closer look at this stuff.

As for separating shutdown() from the destructor: the alternative is to require that all derived classes call shutdown() in their destructors. The reason is that it has to be called by the most derived class. I see no way to detect missing calls. But I also don't see how this affects your case of incomplete construction. Remember that, when you throw an exception from the constructor, the destructor of the class under construction will not be called, only the destructors of the base classes. So you need to manually ensure that you have cleaned up before letting the exception propagate. Specifically, you need to decide whether the class has been constructed far enough so that it's safe to call its own shutdown(), or if it's better to call the shutdown() of the base class. Having shutdown() called from destructors wouldn't change that.

exzombie avatar Jul 21 '23 09:07 exzombie