ovis
ovis copied to clipboard
ldmsd should stop and term plugins at exit time
The ldmsd should stop and call term() for plugins at exit time. Right now (at least in v3) exit happens without plugin cleanup unless the user explicitly issues stop and term commands.
I don't believe this has been implemented.
@morrone it has not. the current framework does not retain all the information needed to do this reliably, and i'm pretty sure that the way the apis are and the way remote references are managed, it would be very difficult to prove what a correct shutdown sequence is.
But given that caveat, i've actually implemented (externally) a shutdown sequence heuristic that for file-based configurations has not failed me yet. It is implemented in ldms-static-test.sh and consists, overall, of generating the inverse of the setup input script. It has proved extremely useful in testing with valgrind for 'real' leaks.
Well, how about we constrain the ticket to sampler plugins. Things are pretty straight forward there, correct? We just need to call the term() function for each sampler.
Even with sampler plugins, we need the reverse-of-startup sequence for minimum generation of goofy errors/warnings. It's fairly typical to start something like a job_id sampler first (and with a slightly negative offset) and then all the other plugins that would like to depend on jobid being present to avoid log whinging. afaik, plugins to not save a 'time-started' field by which we could sort them for shut down.
I think there was at a point some notion of being able to ask a daemon to dump its configuration history, but that's an even harder problem than "show me your current configuration". I don't know where either is in the current work plans.
Where I've found the 'tear-down by reversing config history' most useful is in fact on the storage daemons.
I don't have any interest in a daemon dumping its configuration history. A daemon has a table somewhere of every sampler plugin and that plugin's associated function pointers. At exit time, the daemon should walk that table and call term() on each sampler. That is all that I'm asking for. It is a really basic things that most daemons do.
We (volunteers?) could iterate over ldmsd_config.c:plugin_list checking for entries that are SAMPLER type, call stop in each, then repeat that calling term on each. This process must be fault tolerant (term has 'busy' as a reply). More significantly, sampler plugins are able to be publishers and subscribers, so it's conceivable that arbitrary stop/term may hang in event handler conflicts.