opennms
opennms copied to clipboard
Add startup/shutdown progress bar and improve communication of startup errors to users
What it looks like on success
$ bin/opennms restart
Stopping OpenNMS: 100% [=============================] 26/26 (0:00:07 / 0:00:00) All Passes Complete
Starting OpenNMS: 100% [=============================] 52/52 (0:00:32 / 0:00:00) All Passes Complete
$
What it looks like on failure
(I added extra bogus text to the class name for eventd in etc/service-configuration.xml
)
$ bin/opennms restart
Stopping OpenNMS: Trying to stop OpenNMS but it's already stopped.
Starting OpenNMS: 3% [=> ] 2/52 (0:00:01 / 0:00:29) Pass 0: TestLoadLibraries
An error occurred while attempting to start the "OpenNMS:Name=Eventd" service (class org.opennms.netmgt.eventd.jmx.Eventdbogus). Shutting down and exiting.
java.lang.ClassNotFoundException: org.opennms.netmgt.eventd.jmx.Eventdbogus
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at java.base/java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:904)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:315)
at org.opennms.netmgt.vmmgr.Invoker.instantiateClasses(Invoker.java:159)
at org.opennms.netmgt.vmmgr.Starter.start(Starter.java:156)
at org.opennms.netmgt.vmmgr.Starter.startDaemon(Starter.java:96)
at org.opennms.netmgt.vmmgr.Controller.start(Controller.java:173)
at org.opennms.netmgt.vmmgr.Controller.main(Controller.java:150)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.opennms.bootstrap.Bootstrap$4.run(Bootstrap.java:531)
at java.base/java.lang.Thread.run(Thread.java:829)
Stopping OpenNMS: 100% [=========================================] 26/26 (0:00:00 / 0:00:07) All Passes Complete
Shutdown completed after receiving on error on startup. See error details above.
Started OpenNMS, but it stopped running: see details above or /Users/dgregor/git/opennms/target/opennms-31.0.0-SNAPSHOT/logs/output.log
$
systemd
Unfortunately, we won't see the updates directly out of sudo systemctl restart opennms
. To see status during OpenNMS restarts use journalctl -f -u opennms
to monitor separately (this is what is used for the excerpts below) and/or something like watch -n 1 systemctl status opennms
(I personally like seeing the systemd view with the latter).
Oct 23 00:42:20 dgregor-vm-perf systemd[1]: Starting OpenNMS server...
Oct 23 00:42:22 dgregor-vm-perf opennms[33047]: Starting OpenNMS: running ulimit -a
Oct 23 00:42:22 dgregor-vm-perf opennms[33047]: core file size (blocks, -c) 0
...
Oct 23 00:42:22 dgregor-vm-perf opennms[33047]: file locks (-x) unlimited
Oct 23 00:42:22 dgregor-vm-perf systemd[1]: opennms.service: Can't open PID file /run/opennms/opennms.pid (yet?) after start: Operation not permitted
Oct 23 00:42:22 dgregor-vm-perf systemd[1]: opennms.service: Supervising process 33601 which is not our child. We'll most likely not notice when it exits.
Oct 23 00:42:22 dgregor-vm-perf systemd[1]: Started OpenNMS server.
Oct 23 00:42:23 dgregor-vm-perf opennms[33047]: (not waiting for startup)
Oct 23 00:42:24 dgregor-vm-perf opennms[33600]: Starting OpenNMS: 0% [ ] 0/52 (0:00:42 / 0:00:42)
Oct 23 00:42:24 dgregor-vm-perf opennms[33600]: Starting OpenNMS: 1% [ ] 1/52 (0:00:42 / 0:00:42) Pass 0: TestLoadLibraries
...
Oct 23 00:43:02 dgregor-vm-perf opennms[33600]: Starting OpenNMS: 100% [=============================] 52/52 (0:01:21 / 0:00:03) All Passes Complete
On failure:
...
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: Starting OpenNMS: 3% [ ] 2/52 (0:00:42 / 0:00:42) Pass 0: TestLoadLibraries
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: An error occurred while attempting to start the "OpenNMS:Name=Eventd" service (class org.opennms.netmgt.eventd.jmx
.Eventdblarg). Shutting down and exiting.
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: java.lang.ClassNotFoundException: org.opennms.netmgt.eventd.jmx.Eventdblarg
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:904)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/java.lang.Class.forName0(Native Method)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/java.lang.Class.forName(Class.java:315)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at org.opennms.netmgt.vmmgr.Invoker.instantiateClasses(Invoker.java:160)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at org.opennms.netmgt.vmmgr.Starter.start(Starter.java:156)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at org.opennms.netmgt.vmmgr.Starter.startDaemon(Starter.java:96)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at org.opennms.netmgt.vmmgr.Controller.start(Controller.java:173)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at org.opennms.netmgt.vmmgr.Controller.main(Controller.java:150)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43
)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/java.lang.reflect.Method.invoke(Method.java:566)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at org.opennms.bootstrap.Bootstrap$4.run(Bootstrap.java:531)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: at java.base/java.lang.Thread.run(Thread.java:829)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: Stopping OpenNMS: 0% [ ] 0/26 (0:01:09 / 0:01:09)
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: Stopping OpenNMS: 3% [= ] 1/26 (0:01:09 / 0:01:09) Pass 0: Trapd
...
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: Stopping OpenNMS: 100% [=============================] 26/26 (0:01:09 / 0:01:09) All Passes Complete
Oct 22 02:05:26 dgregor-vm-perf opennms[127533]: Shutdown completed after receiving on error on startup. See error details above.
Oct 22 02:05:28 dgregor-vm-perf opennms[127711]: Stopping OpenNMS:
Oct 22 02:05:28 dgregor-vm-perf opennms[127967]: Trying to stop OpenNMS but it's already stopped.
Oct 22 02:05:28 dgregor-vm-perf opennms[127711]: stopped
Oct 22 02:05:28 dgregor-vm-perf systemd[1]: opennms.service: Killing process 127534 (n/a) with signal SIGKILL.
Oct 22 02:05:28 dgregor-vm-perf systemd[1]: opennms.service: Killing process 127534 (n/a) with signal SIGKILL.
Oct 22 02:05:28 dgregor-vm-perf systemd[1]: opennms.service: Succeeded.
New log files
- manager.json contains elapsed times from previous stop/start runs; used for the ETA on the progress bar. Blame @mhuot for suggesting adding this (it was actually quite easy).
- progressbar.log contains the progress bar output. This is just
tail
ed in the background in theopennms
script.
$ tail -4 logs/manager.json
{"time":"2022-09-24T18:48:21.181Z","elapsed":"PT7.502S","type":"STOP"}
{"time":"2022-09-24T18:48:58.978Z","elapsed":"PT32.134S","type":"START"}
{"time":"2022-09-24T18:49:26.387Z","elapsed":"PT7.53S","type":"STOP"}
{"time":"2022-09-24T18:50:03.164Z","elapsed":"PT31.122S","type":"START"}
$ tail logs/progressbar.log
Start 100% [=======================] 52/52 (0:00:31 / 0:00:00) PerspectivePoller
Stop 100% [==================================] 26/26 (0:00:07 / 0:00:00) Manager
Start 100% [=======================] 52/52 (0:00:32 / 0:00:00) PerspectivePoller
Stop 100% [==================================] 26/26 (0:00:07 / 0:00:00) Manager
Start 100% [=======================] 52/52 (0:00:30 / 0:00:01) PerspectivePoller
$
Files changed in etc
- log4j2.xml -- two things:
- Change
Invoker
andManager
log level toDEBUG
but set the log level for the console appender toINFO
. Functionally, this keeps the same level of information on the console as before, but logs these classes atDEBUG
level to the log files. This way, messages for theInvoker
andManager
get logged to the log file at the same level as all of our other logs (there are occasionally some useful bits in there). - Add new logging routes for the manager JSON file (which records startup/shutdown times) and the progress bar log file. The progress bar support will not work without these changes.
- Change
- opennms.service -- change to
ExecStop
configuration option to add-s
option when callingopennms stop
. This is a nice-to-have to not have duplicate logs in systemd on shutdown, but is not needed, and it is backward-compatible.
Because of this, it would be preferable (but I don't know how preferable) to get this change in at the initial release on a foundation branch.
New opennms.conf option
PROGRESS_BAR
defaults to 1 to enable this feature. Add PROGRESS_BAR=0
to opennms.conf to disable and get the old functionality.
New libraries
- me.ctongfei:progressbar -- https://github.com/ctongfei/progressbar/
- org.jline:jline -- https://github.com/jline/jline3 (transitive dependency of progressbar)
New system properties
-
opennms.name
-- the value that gets substituted in at build time from${install.package.description}
. Defaults toOpenNMS
. -
terminal.columns
-- the terminal width at start time (if it could be determined). Defaults to 100. Only set iftput
determines that there is a tty with a TERM set. The existence of this property is also used to decide if the progress bar will be in "interactive" mode, which case the output is tweaked a bit (mostly to support systemd, which will be non-interactive).
Hack (no longer needed; removed)
progressbar 0.9.5 has been released the our code updated, so this hack is gone.
~~The existing ETA calculation in the progress bar library doesn't work well for our non-linear daemon startup times. The library very recently added a way to for callers to provide a function to calculate ETA through ProgressBarBuilder.setETAFunction
, however the method signature references a package private class, so it can't be externally called. I hacked around it by adding a small helper class in the me.tongfei.progressbar
package. Once https://github.com/ctongfei/progressbar/tree/0.9.5 is merged (https://github.com/ctongfei/progressbar/pull/146 has already been merged to that branch, but the approach might be tweaked a bit, which is fine), this hack can go away.~~
To-Do
- [x] Is this a good way to do this? Any suggestions on approach?
- [x] Probably some code cleanup; I kinda just go this working and got the high-level structure decent-ish.
- [x] Cleanup commit messages, maybe
- [x] There is one open XXX item in the
opennms
startup script aroundsystemd
that I could use someone more familiar withsystemd
to take a look at.
All Contributors
- [x] Have you read our Contribution Guidelines?
- [x] Have you (electronically) signed the OpenNMS Contributor Agreement?
Contribution Checklist
- Please make an issue in the OpenNMS issue tracker if there isn't one already.
Once there is an issue, please:- update the title of this PR to be in the format:
${JIRA-ISSUE-NUMBER}: subject of pull request
- update the JIRA link at the bottom of this comment to refer to the real issue number
- prefix your commit messages with the issue number, if possible
- once you've created this PR, please link to it in a comment in the JIRA issue Don't worry if this sounds like a lot, we can help you get things set up properly.
- update the title of this PR to be in the format:
-
If this code is likely to affect the UI, did you name your branch with
-smoke
in it to trigger smoke tests? - If this is a new or updated feature, is there documentation for the new behavior?
- If this is new code, are there unit and/or integration tests?
- If this PR targets a
foundation-*
branch, does it try to avoid changing files in$OPENNMS_HOME/etc/
?
What's Next?
A PR should be assigned at least 2 reviewers. If you know that someone would be a good person to review your code, feel free to add them.
If you need help making additions or changes to the documentation related to your changes, please let us know.
In any case, if anything is unclear or you want help getting your PR ready for merge, please don't hesitate to say something in the comments here, or in the #opennms-development chat channel.
Once reviewer(s) accept the PR and the branch passes continuous integration, the PR is eligible for merge.
At that time, if you have commit access (are an OpenNMS Group employee or a member of the OGP) you are welcome to merge the PR when you're ready. Otherwise, a reviewer can merge it for you.
Thanks for taking time to contribute!
External References
- JIRA (Issue Tracker): https://opennms.atlassian.net/browse/NMS-15116
This generally looks OK to me; what does it look like in systemctl
? Do we need to make init calls to startup disable the progress bar?
This generally looks OK to me; what does it look like in
systemctl
? Do we need to make init calls to startup disable the progress bar?
This. Generally, users will never run bin/opennms start
by hand, they systemctl start opennms
.
This generally looks OK to me; what does it look like in
systemctl
? Do we need to make init calls to startup disable the progress bar?
That was my last question. :) I don't have any Linux systems, but I will spinup a test on an ubuntu container when I have enough spare cycles and see how things look and tweak as needed. I was almost there last week but ran out of time while chasing down dependencies for download-artifacts.pl
.
This. Generally, users will never run
bin/opennms start
by hand, theysystemctl start opennms
.
@dino2gnt I did some testing on Ubuntu with systemd and made some improvements. I added a systemd section to the description of this PR with some samples. It doesn't make any difference if the user just runs sudo systemctl restart opennms
, but it works out well if they run journalctl -f -u opennms
to tail the logs or something like watch -n 1 systemctl status opennms
.
And I learned about systemd in the process. :)
@RangerRick I have pointed this progress bar PR back at develop, all checks passed, and it's ready to review. Do you want to add anyone else that should take a look? Also, since this is pointing at develop, there is no rush--it can wait a bit while release-31.x work is happening.
@dino2gnt If you want to review how this shows up in systemd, there is a systemd section in the description of the PR.