nut icon indicating copy to clipboard operation
nut copied to clipboard

How to use nut-monitor on primary without actualy shuting down the system?

Open ne20002 opened this issue 1 year ago • 9 comments

I have a setup consiting of the following:

  1. a raspi with a UPS's USB connected which shall run as primary
  2. three devices connected to the ups which are configured as secondary
  • the devices under 2. are not all always on but one or two in different combinations so none of them is therefor a possible primary.
  • the raspi's power is not connected to the UPS but the USB cable is.
  • the upsmon and upssched on the raspi is used to monitor the UPS and to also do some other stuff like e.g. sending notifications with ntfy and mqtt.
  • as the raspi shall not be powered down, I replaced the SHUTDOWNCMD on the raspi with a script sending a ntfy message.

The setup looks straight forward and I configured it with the raspi as primary and the other devices as secondary and it works well, except for:

When the shutdown on the raspi is started the shutdown on the secondaries is performed and then I get the expected notificaton. But from then on the nut-monitor.service is stopped and disabled so I loose the monitoring.

Is there any way to use the nut-monitor as primary including managing the secondaries but then simple not shutting down the primary and not stop functioning when shutdown is triggered?

ne20002 avatar Jun 20 '24 17:06 ne20002

Set MINSUPPLIES on it to 0 (I think it should be possible), as well as 0 for the number of PSU's of RasPi powered by the particular UPS on the MONITOR line.

jimklimov avatar Jun 20 '24 18:06 jimklimov

Thank you very much @jimklimov It's sometimes so easy. It worked.

ne20002 avatar Jun 21 '24 07:06 ne20002

Hi @jimklimov I have to reopen this issue. With the suggested settings the primary keeps being online and the monitoring continues. Unfortunately the handling of the secondaries is not longer working properly.

Previously the primary contacted the secondary(ies) and they did a FSD with the primary waiting for the secondaries and then shutting down its service (which was my problem in the first) and the UPS. This is no longer the case.

Now I get notifications at 'on battery' and 'low battery' event. Also, the secondary starts shutdown on itself (caused by low battery event). But the primary keeps running and the UPS is not shut off.

Any idea?

ne20002 avatar Jul 05 '24 11:07 ne20002

Well, it is not a very common setup, so some creativity may be needed.

One first quick sanity check: while that MONITORed UPS is said to feed zero supplies of RasPi, is it monitored as a master/primary? I think that upsmon should issue FSD for critical state of devices it is primarily responsible for regardless of PSU count; if it does not - it may be a recent regression or an oversight from older times. Which NUT version do you run there? Packaged or custom-built?

Notably, a server can have many power supplies fed by different UPSes, so there would be many MONITOR lines with non-zero fed-PSU counts and independent FSD's issued for them all (if it is a primary, for the benefit of other clients). Each system with its own upsmon would (should!) only call SHUTDOWNCMD when the count of known healthy (non-critical) devices goes under that system's MINSUPPLIES setting.

Does the RasPi also feed from some other UPS which it should monitor as an actual power source for its own shutdowns? If yes, you might be better off running two upsmon's, one for personal consumption of this machine and another to manage the UPS for others. Probably would have to tweak the init-script (or drop-in files for a systemd service instance, or a copy of the service unit definition file just tweaked locally) to use custom environment variables (NUT_CONFPATH, NUT_PIDPATH, NUT_ALTPIDPATH, and maybe NUT_STATEPATH), to avoid conflicts with the other one.

Either way, the upsmon which manages the UPS for others would consider this UPS as feeding its own (only) power source, so the power events would cause an FSD and shutdown (after other clients log off or a timeout passes). However, the SHUTDOWNCMD in this instance would be rigged to call the locally deployed copy of https://github.com/networkupstools/nut/blob/master/scripts/systemd/nutshutdown.in (may be packaged into /usr/lib/systemd/system-shutdown/nutshutdown) and command that UPS to turn off while not impacting this (RasPi) system.

Hope this helps, Jim Klimov

jimklimov avatar Jul 05 '24 16:07 jimklimov

Also, just in case you haven't: peruse https://networkupstools.org/docs/FAQ.html and https://github.com/networkupstools/ConfigExamples/releases/latest/download/ConfigExamples.pdf - maybe they would inspire some more solutions.

jimklimov avatar Jul 05 '24 16:07 jimklimov

This is a simple visualization.

--- power -+- UPS 5V ----------- Raspi
            |                       |
            |      +====== USB =====+
            |      |
            |      |
            +- Eaton UPS  --+--- Server A
                            |
                            |
                            +--- Server B
                            |
                            |
                            +--- Server C

The Raspi has its own 5V UPS which is not equiped with a USB interface and therefore not NUT-compatible. But it's a long lasting UPS and required as the Raspi is also running a few other things.

The Servers A to C shall be protected by the Eaton UPS. None of the server is running all the time, usually it's only one online so none of those is suitable to run as NUT primary.

The Raspi now should monitor the UPS (Grafana), and is doing some messaging (that was easy to include into upssched).

I'm not sure if it is a NUT primary by definition, I call it the monitoring/control node. But as the UPS is monitored by the Raspi it should be the one switching off the Eaton UPS in case of line power loss after all the currently used servers have been shut down. And it shall resume functioning.

I believe this setup is not exotic (I also think of adding a Raspi USV which is NUT capable as an extra UPS for the Raspi).

Is there a way to fool NUT with a dummy UPS as second UPS for the Raspi so that it shuts down the servers after low power on the Eaton UPS and switches the UPS off then but itself still being kept online as the dummy UPS is still available for the Raspi?

But maybe the simple solution is to still use the mgmt Raspi to be monitored and set as primary and just replace the SHUTDOWNCMD so that it does not shut down the Raspi (and its services) but the UPS.

ne20002 avatar Jul 05 '24 18:07 ne20002

Just in case you know: there were a number of GPIO and I2C drivers added to NUT lately, some specifically for Raspberry-related dongles and HATs. Maybe one of those (or a similar development) can help monitor that 5V UPS? USB is just one of a dozen media types that NUT can support already (protocols are the tricky part).

jimklimov avatar Jul 05 '24 20:07 jimklimov

A dummy-ups that would always say it is healthy sounds like an option. Then MINSUPPLIES 1 would be always satisfied, even if you set the "real" UPS to be MONITOR'ed as a primary which does feed 1 PSU of the RasPi (and dummy says it feeds a second PSU).

Hopefully that arrangement would tell upsmon to raise FSD status for other clients when the real UPS is in critical situation. It would not however call SHUTDOWNCMD for the RasPi itself, as its power is said to be provided by dummy. If you want to explore this path, a NOTIFYCMD calling upssched - and that one calling your custom script to launch essentially upsdrvctl shutdown Eaton if the envvar clues from upssched say that this UPS has had a situation.

jimklimov avatar Jul 05 '24 20:07 jimklimov

As for primary/secondary upsmon - please do check NUT docs :)

jimklimov avatar Jul 05 '24 20:07 jimklimov

I'll do some more tests. The 5V UPS for the Raspi is a 72WAh providing 12V for the router and 5V for the raspi. It is capable of running both devices for more than 6 hours. So internet is still available in case of power shortage. But it does not have any management interface.

ne20002 avatar Jul 06 '24 07:07 ne20002

Ok, no luck so far.

I added a dummy ups to the control node and set the Eaton as primary and the dummy as secondary. This prevents the control node from being shutdown, it continues to work. But the secondary on the server does not get a FSD. The events go:

  1. on battery (control node, server)
  2. low battery (control node), shutdown (server) -> ups is not switched of

I then added a '/usr/sbin/upsmon -c fsd' in the upssched event for low battery

  1. on battery (control node, server)
  2. low battery (control node)
  3. shutdown (server)
  4. forced shutdown (server)
  5. shutdown (control node)
  6. SHUTDOWNCMD (control node)

This causes nut-monitor to be stopped. Also it does not seem to wait for the server as the ups is been switched of before the server is shut down. The primary did not wait the HOSTSYNC. On the control node the /etc/killpower has been written causing the server to be shut down immediately after power restored. It needed the control node to restart to clean this up.

I will do one more test with disabling the /etc/killpower file. If this is not working I will connect the UPS to the currently server and need to buy two more UPS, one for each server.

It's Debian bookworm with nut 2.8.0.

ne20002 avatar Jul 06 '24 11:07 ne20002

I will continue with the setup and go without the fsd mechanism. I will shutdown the servers on low battery event (this works out of the box) and will start a timer on the control node (Raspi) with upssched to shutdown the ups after given enough time to the servers. Not as good as being able to wait for the server shutdowns before switching off the ups but should work.

ne20002 avatar Jul 06 '24 12:07 ne20002