asyn
asyn copied to clipboard
Destructible ports
This PR extends asynManager to allow a port to be destructible, solving #172. It complements epics-base/epics-base#283.
Goals
- Provide a function in
asynManagerand a corresponding iocsh command that shuts down a port. - Shut down ports on IOC exit.
- The port cannot be re-enabled.
- The low-level driver is given a chance to release resources, or outright destroy itself.
- Access to the low-level driver is prevented.
- The
asynPortDriverbase class uses the new functionality. - All of this is opt-in so that existing drivers continue to work as they always did.
Port shutdown
There is a new function, asynManager::shutdown(), which
- disables the port, short-circuiting
asynManager::queueRequest()andasynManager::lockPort(), thus preventing holders ofasynUserdangling references from using them; - nullifies all driver pointers that are reachable from the port, none of which should be reachable from user code thanks to the previous point;
- marks the port as defunct, which prevents re-enabling it;
- triggers a new exception,
asynExceptionShutdown, which is to be handled by a conforming driver.
There is a new iocshell command that shuts down a port, useful mostly for testing.
This functionality is opt-in. A conforming driver declares that it is destructible by passing a new attribute, ASYN_DESTRUCTIBLE. In such a case, asynManager will register an epicsAtExit hook to call shutdown() on IOC exit. A conforming driver has to handle asynExceptionShutdown to perform cleanup, and may invalidate itself completely by calling delete.
asynPortDriver and inheritance
asynPortDriver uses the new functionality by
- passing through
ASYN_DESTRUCTIBLEif it is given to it via theasynFlagsparameter; - adding an exception handler that deletes the driver, relying on the destructor for cleanup;
- adding a new virtual function,
asynPortDriver::shutdown(), which allows things not possible in a destructor; - calling
pasynManager::shutdown()from its destructor (while preventing recursion) so that callingdeleteon the driver also disables the port.
Because this functionality is opt-in, there's a question of backwards compatibility in an inheritance tree, such as those in areaDetector. This can be solved by prescribing that a driver may only receive the ASYN_DESTRUCTIBLE flag and act upon it, but may not force it. This flag should come in via the iocshell command that instantiates a driver. This way, the driver will be destructible only if the most derived class is instantiated with this flag.
Limitations
- It is not possible to release all memory because asyn was not designed in a way to allow that. A port and all the structures describing it will continue to exist indefinitely, as will those
asynUserinstances that the users don't destroy. But, as stated in #172, memory is not the resource that anyone is concerned about on IOC shutdown. - Deleting a subclass of
asynPortDriverdirectly, without callingasynManager::shutdown(), can work, but only if the most derived class callsasynPortDriver::shutdown()from its destructor. This is encouraged by the documentation but not enforced. It only affects unit tests that want to delete drivers, to my knowledge.
TODO
- [x] documentation
- [x] tests
:x: Build asyn 1.0.161 failed (commit https://github.com/epics-modules/asyn/commit/a796d33d75 by @exzombie)
I've simplified things a bit since yesterday, and did a smoke test with areaDetector. I noticed that its plugins already have working destructors, so I just forced the NDPosPlugin to declare itself as destructible. valgrind reported no new issues. The base class destructor in asynPortDriver ensures that the port is marked as defunct, so even if a driver is deleted directly, everything should be fine.
@MarkRivers, where do I update the documentation? I noticed that you just added a whole new subdirectory.
:x: Build asyn 1.0.176 failed (commit https://github.com/epics-modules/asyn/commit/f56e71eb33 by @exzombie)
:white_check_mark: Build asyn 1.0.177 completed (commit https://github.com/epics-modules/asyn/commit/35add117e6 by @exzombie)
where do I update the documentation? I noticed that you just added a whole new subdirectory.
The documentation is now in docs/source/.rst. I have deleted all of the old documentation/.html files that have been converted to .rst. There are still a few small files left to do.
There is a Github Actions that publishes the docs to https://epics-modules.github.io/asyn
:white_check_mark: Build asyn 1.0.196 completed (commit https://github.com/epics-modules/asyn/commit/921b11a85d by @exzombie)
I've rebased, wrote the documentation, and extended the example driver. While writing the docs, I noticed that asynManager::lockPort doesn't go through the request queue as I had thought. Now, lockPort also returns an error.
I also reintroduced the asynPortDriver::shutdown() function. I had removed it because it seemed to me that things would be simpler without it, and I didn't see a good use case. However, I have taken a close look at some of the scars I got in the past when dealing with a design that relied on calling overridden virtual functions from a base class destructor. Not a good idea.
:white_check_mark: Build asyn 1.0.214 completed (commit https://github.com/epics-modules/asyn/commit/33edcb38c1 by @exzombie)
:white_check_mark: Build asyn 1.0.215 completed (commit https://github.com/epics-modules/asyn/commit/b66b9a3621 by @exzombie)
Rather than having every driver create an exiting_ flag and set it in the destructor, would it be nicer to provide an asynManager method pasynManager->isExiting(), and/or an asynPortDriver method isExiting()?
Ah, you're referring to the changes I made to the example driver. Well, having an exit flag and having threads act on it is a standard pattern, so it might indeed make sense to provide this flag. Maybe. As I see it, there's two aspects to consider: how such a mechanism would affect the shutdown, and how to provide it in the first place.
Let's suppose there exists an isExiting() function, and that all subclasses
use it. Meaning, the threads spawned by each subclass consult this function to
decide when to return. And when I say "subclasses", I mean a linear inheritance
chain starting with asynPortDriver. It is possible for each subclass to create
as many background threads as it wants.
First, setting an "exit" flag may not be enough on its own. Often, the
destructor (or shutdown()) has to wake sleeping threads. This is the case for,
say, the example driver. And then, the destructor needs to wait for the thread
to finish, preferably by joining it. In this context, having isExiting()
provided by asynPortDriver base class instead of the subclass is only a minor
improvement, I think.
Second, the flag used by isExiting() is set once, and seen by all the threads
of all subclasses in parallel. I'm not sure whether this is fine or not. I can
imagine a subclass wanting to do a thing or two in shutdown() while its base
class is operating normally. This is not possible if isExiting() stops all
threads at once.
Which brings us to the question: who sets the exiting flag? I had written a
long-ish explanation of how things could work and why most of it is not a good
idea, but I scrapped the text because, well, most if it is not a good idea. So
let me just say that the only way I see this working well is if shutdown is
only allowed to be triggered by asynManager, and a driver may never be
deleted directly. Which would mean that all the tests need to be fixed. Luckily,
it could be enforced, softly, by printing an error message in asynPortDriver
destructor if it executed while the port was still alive.
Is it worth having all threads halt at the same time, and not being able to directly delete drivers, just to avoid having one boolean per subclass?
For this PR I think it would be good to create an Issue that contains the following:
- The "problems" that this PR is intended to address. For example:
- IOCs crash when exiting?
- IOCs hang when exiting?
- IOCs leak resources when exiting?
- Inability to delete a driver and leave the IOC running?
- Others?
- The goals of this PR in addressing these problems
- Limitations
- For example asynManager cannot delete ports so some resource leak is unavoidable
Sure, I can add some background. But why open an issue? This stuff belongs to the cover letter of the PR, no? Especially the goals and limitations.
General advice on issues vs. PRs:
- issues: describe some situation that needs attention, comments supply supporting details
- PR: a resolution to the issue and discussion about the implementation
The Why? question is the most expensive question that we encounter. An issue provides a direct explanation why a specific software change was initiated.
Issues describe why, PRs describe how.
In other words, the "problems" go into the issue, and the "goals" and "limitations" go into the description above. Clear enough. The reason I asked is that, personally, I have no problem if the problem itself is described in the PR, with no corresponding issue ticket. And I was just wondering if you want a separate issue for some functional reasons, or if it was just aesthetics. In any case, I'll open the issue with the "problem", no problem :)
And I was just wondering if you want a separate issue for some functional reasons, or if it was just aesthetics.
Aesthetics, along the lines of what @prjemian said.
@ericonr Thank you very much for the review!
:white_check_mark: Build asyn 1.0.222 completed (commit https://github.com/epics-modules/asyn/commit/39e3cc2b48 by @exzombie)
@anjohnson, @mdavidsaver, and @ralphlange could you please look at this? I would like to merge into the next asyn release if you are good with it. Thanks.
Looks good and appropriate to me.
What is the story for concurrent access to the new defunct and shutdownNeeded flags?
I have almost convinced myself that all changes are made under the port lock, but I think some reads are not guarded.
eg. In lockPort(), the defunct flag is tested before any locking.
Maybe it should be tested after locking synchronousLock?
eg. The added doc. comment for asynPortDriver shows a sub-class destructor calling needsShutdown() which does no locking. Though it isn't clear to me if this test is actually needed vs. unconditionally calling asynPortDriver::shutdown().
eg. In lockPort(), the defunct flag is tested before any locking. Maybe it should be tested after locking synchronousLock?
You're right, will fix.
eg. The added doc. comment for asynPortDriver shows a sub-class destructor calling needsShutdown() which does no locking. Though it isn't clear to me if this test is actually needed vs. unconditionally calling asynPortDriver::shutdown().
Excellent point. The answer is "yes, it's needed". However, the fact that it wasn't clear to you, and that I myself got a bit confused when trying to explain how shutdown() can be called from different places, means that subclass authors will get this wrong. Thanks for pointing this out. At the very least, I need to update the documentation.
I was hesitant to allow asynPortDriver to be deleted manually (like, in tests) exactly because it causes complications. I will think about this aspect again. However, having three mutexes involved gives me a headache, so it might take a while.
:x: Build asyn 1.0.237 failed (commit https://github.com/epics-modules/asyn/commit/3d818ba8f2 by @exzombie)
Some commit messages have typos:
Prettify the definition if iocsh usage messages Change comment style in asynManger.c for consistency
I will try and test this soon, though I don't have as much of a need for port destruction anymore.
... However, having three mutexes involved gives me a headache, so it might take a while.
Not a request, only a thought. I've taken to using comments within struct/class definitions to document member variable locking/access rules. eg. // the following guarded by 'mymutex' or // const after constructor or // only access from worker thread. This helps me to remember my intent. And encourages me to have an intent! Whether this intent gets correctly expressed in code is another question.
eg. https://github.com/mdavidsaver/pvxs/blob/8ea613cb078579076244c654a1677eff6e143c30/src/clientmon.cpp#L64-L74
:white_check_mark: Build asyn 1.0.237 completed (commit https://github.com/epics-modules/asyn/commit/b106f3e813 by @exzombie)
I've thought about how to make things more straightforward for users, and decided that it is simpler to just forbid deleting a destructible driver without calling shutdown() first. If the user does that, they will get an error message; the port will still be marked as defunct so most problems are avoided. I think that whoever marks a port as destructible is perfectly capable of calling shutdown() in their test code, while on the other hand, ensuring all the classes in a hierarchy call shutdown() properly when destroyed is less straightforward.
I also added a note explaining that the shutdown() function is called with the driver unlocked. This function has to do it's own locking and unlocking because that's the only way to ensure that threads can make progress and eventually terminate. The shutdownNeeded flag is now atomic, so that it doesn't matter whether the user remembers to lock the driver before calling needsShutdown().
The "template" given in the doxygen comment contained a hint that the ASYN_DESTRUCTIBLE flag should be forwarded in the constructors. I removed that, it's a transitional thing. The intent was to provide a guideline for gradually porting a class hierarchy such as the one in areaDetector, but I think it's better if we keep the asyn docs clean and find agreement on how to do that separately.
This stuff is in 366f154ee74f94bfbdc99fd84d9c2a52db45d8be.
:x: Build asyn 1.0.239 failed (commit https://github.com/epics-modules/asyn/commit/925271f8ee by @exzombie)
I've thought about how to make things more straightforward for users, and decided that it is simpler to just forbid deleting a destructible driver without calling shutdown() first. If the user does that, they will get an error message; the port will still be marked as defunct so most problems are avoided. I think that whoever marks a port as destructible is perfectly capable of calling shutdown() in their test code, while on the other hand, ensuring all the classes in a hierarchy call shutdown() properly when destroyed is less straightforward.
I don't know if I agree with this, because it makes it harder to define a constructor for a class which inherits from asynPortDriver and that can have an exception in the middle of it and simply allow that exception to propagate so the port is automatically destroyed during construction (basically, I'd like to have fallible constructors).
That said, the current code doesn't seem to work for this purpose yet. I've tested a few scenarios in my constructor, which calls asynPortDriver::asynPortDriver() with ASYN_DESTRUCTIBLE.
- call
this->shutdown()in constructor and return. Trying to load a record which uses that port results in a segfault
drvUserCreate (drvPvt=0x0, pasynUser=0xbdded8, drvInfo=0xb1fc20 "NAME", pptypeName=0x0, psize=0x0) at ../../asyn/asynPortDriver/asynPortDriver.cpp:3392
3392 pPvt->lock();
- call
this->shutdown()and then throw an exception. Trying to load the record still results in a segfault. - simply throw the exception.
asynPortDriver::~asynPortDriverruns and prints its warning message, and we still get a segfault when trying to load the record.
The issue seems to be that the mechanism was implemented considering first destroying ports after iocInit, I'd assume?
Ah, you are quite correct, I had not considered this use case, my priority is IOC shutdown. That said, I can definitely see the appeal of delegating handling a dysfunctional driver to asyn, instead of having to carefully return errors from a partially-initialized driver. Thanks for pointing it out; it's in general inevitable that someone finds a use case that wasn't considered, and I'm glad you did it before the change was merged.
A user of the port needs to take several steps in order to obtain a useful handle, and I'll need to think about when it's best to return an error. For example, it may well be that connectDevice() is too early as it would neuter a number of iocsh commands. I'll take a closer look at this stuff.
As for separating shutdown() from the destructor: the alternative is to require that all derived classes call shutdown() in their destructors. The reason is that it has to be called by the most derived class. I see no way to detect missing calls. But I also don't see how this affects your case of incomplete construction. Remember that, when you throw an exception from the constructor, the destructor of the class under construction will not be called, only the destructors of the base classes. So you need to manually ensure that you have cleaned up before letting the exception propagate. Specifically, you need to decide whether the class has been constructed far enough so that it's safe to call its own shutdown(), or if it's better to call the shutdown() of the base class. Having shutdown() called from destructors wouldn't change that.