Poll UPS failed - Data stale
Hello I have a Bluewalker UPS Powerwalker VI 2200 LCD UPS. Unfortunately, always after some time comes the following error messages:
usbhid-ups[2387]: libusb_get_interrupt: could not claim interface 0: Device or resource busy usbhid-ups[2387]: libusb_get_report: error sending control message: No such device usbhid-ups[2387]: libusb_get_interrupt: could not claim interface 0: Device or resource busy usbhid-ups[2387]: libusb_get_report: error sending control message: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device usbhid-ups[2387]: libusb_get_interrupt: could not claim interface 0: Device or resource busy usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device usbhid-ups[2387]: libusb_get_interrupt: could not claim interface 0: Device or resource busy usbhid-ups[2387]: libusb_get_report: error sending control message: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device systemd[1]: /lib/systemd/system/nut-monitor.service:6: PIDFile= references path below legacy directory /var/run/, updating /var/run/nut/upsmon.pid → /run/nut/upsmon.pid; pleas systemd[1]: nut-monitor.service: Supervising process 2399 which is not our child. We'll most likely not notice when it exits. systemd[1]: /lib/systemd/system/nut-monitor.service:6: PIDFile= references path below legacy directory /var/run/, updating /var/run/nut/upsmon.pid → /run/nut/upsmon.pid; pleas systemd[1]: nut-monitor.service: Supervising process 2399 which is not our child. We'll most likely not notice when it exits. usbhid-ups[2387]: libusb_get_interrupt: error sending control message: Value too large for defined data type usbhid-ups[2387]: libusb_get_report: error sending control message: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device usbhid-ups[2387]: libusb_get_interrupt: error submitting URB: No such device upsd[2391]: Data for UPS [myups] is stale - check driver upsmon[2399]: Poll UPS [myups@localhost] failed - Data stale upsmon[2399]: Communications with UPS myups@localhost lost upsmon[2399]: Poll UPS [myups@localhost] failed - Data stale
In this case the server has been running for 55 days. So after 55 days of operation now these error messages come. What could be the reason for this?
lsusb:
Bus 001 Device 050: ID 06da:ffff Phoenixtec Power Co., Ltd.
cat ups.conf:
[myups]
driver = usbhid-ups
port = auto
vendorid = 06da
desc = "myups"
pollinterval = 15
upsd -v:
Network UPS Tools upsd 2.7.4
For the most part the messages indicate that the OS lost the USB device, not much that NUT can do to fix that part. Might be some power-saving disabling the chip, or something else frozen on the hub and a USB reset restarting every connection (either intentionally, or because some vendors cut corners and implement a port reset same as hub reset).
Guessing by your device name, containing "vm", I might add potentially some delays or flaws in virtualization/passthrough layer as a possible problem source.
On a busy machine (hypervisor with many guests, or other compute/IO/context-switch heavy systems) or on an under-powered machine (preempted VM just not on CPU) there might also be some exceeded hardcoded timeouts in any layer of the OS, drivers and software down to libusb used by NUT driver, perceiving a loss of device because last data exchange was a bit too long ago.
In that log I see the lost device reported all over April, but the driver only aborted once? I believe that means retries succeeded to find (maybe reinitialize) the USB device except once.
For context: which OS and version of NUT (packaged or custom built) is that happening with? If you do/can run a custom build, are you in position to check if native libusb-1.0 support from some of the competing branches in issue #300 would behave better?
For the most part the messages indicate that the OS lost the USB device, not much that NUT can do to fix that part. Might be some power-saving disabling the chip, or something else frozen on the hub and a USB reset restarting every connection (either intentionally, or because some vendors cut corners and implement a port reset same as hub reset).
This is a Debian 10 which runs on a HP ProLiant DL360. I mention this because I can't imagine that such a server has energy saving mechanisms for the USB system. How can I find out if something like this happened at this moment?
Guessing by your device name, containing "vm", I might add potentially some delays or flaws in virtualization/passthrough layer as a possible problem source.
It is not a virtual machine but the hypervisor. As platform I use Debian 10 with Proxmox VE.
On a busy machine (hypervisor with many guests, or other compute/IO/context-switch heavy systems) or on an under-powered machine (preempted VM just not on CPU) there might also be some exceeded hardcoded timeouts in any layer of the OS, drivers and software down to libusb used by NUT driver, perceiving a loss of device because last data exchange was a bit too long ago.
As already described, the system actually runs a hypervisor. And the system is also relatively busy. But somehow I can't imagine that this is enough to hang up so much.
In that log I see the lost device reported all over April, but the driver only aborted once? I believe that means retries succeeded to find (maybe reinitialize) the USB device except once.
It seems so yes: the last few months the connection is repeatedly interrupted, only unfortunately today it did not work to restore the connection.
For context: which OS and version of NUT (packaged or custom built) is that happening with? If you do/can run a custom build, are you in position to check if native libusb-1.0 support from some of the competing branches in issue #300 would behave better?
I just tried a few times to compile the branches "libusb-1.0+0.1", "libusb-1.0" and "libusb-compat-1.0" but with no full success: The branches "libusb-1.0+0.1" and "libusb-1.0" I could compile but were unable to use them. I always get the following error message:
upsdrvctl[44892]: Failed to read pid from /var/state/ups/usbhid-ups1-myups.pid
upsdrvctl[44892]: Network UPS Tools - Generic HID driver 0.42 (2.7.4-420-g2999c95f)
upsdrvctl[44892]: USB communication driver (libusb 1.0) 0.02
upsdrvctl[44892]: No matching HID UPS found
upsdrvctl[44892]: Driver failed to start (exit status=1)
upsmon[18049]: Poll UPS [myups@localhost] failed - Driver not connected
That failure to start probably means that it probed available USB devices and did not find a supported vendor/product ID. Might be that as of those branches the IDs for your UPS were not known yet, which is unlikely at least if you were running last release before that, or that the device remains not seen on the OS side.
Try starting the driver with -DDDDD argument for debug with higher
verbosity; it would mention IDs it saw in particular.
On Thu, Apr 22, 2021, 00:20 HyP3r @.***> wrote:
For the most part the messages indicate that the OS lost the USB device, not much that NUT can do to fix that part. Might be some power-saving disabling the chip, or something else frozen on the hub and a USB reset restarting every connection (either intentionally, or because some vendors cut corners and implement a port reset same as hub reset). This is a Debian 10 which runs on a HP ProLiant DL360. I mention this because I can't imagine that such a server has energy saving mechanisms for the USB system. How can I find out if something like this happened at this moment?
Guessing by your device name, containing "vm", I might add potentially some delays or flaws in virtualization/passthrough layer as a possible problem source. It is not a virtual machine but the hypervisor. As platform I use Debian 10 with Proxmox VE.
On a busy machine (hypervisor with many guests, or other compute/IO/context-switch heavy systems) or on an under-powered machine (preempted VM just not on CPU) there might also be some exceeded hardcoded timeouts in any layer of the OS, drivers and software down to libusb used by NUT driver, perceiving a loss of device because last data exchange was a bit too long ago. As already described, the system actually runs a hypervisor. And the system is also relatively busy. But somehow I can't imagine that this is enough to hang up so much.
In that log I see the lost device reported all over April, but the driver only aborted once? I believe that means retries succeeded to find (maybe reinitialize) the USB device except once. It seems so yes: the last few months the connection is repeatedly interrupted, only unfortunately today it did not work to restore the connection.
For context: which OS and version of NUT (packaged or custom built) is that happening with? If you do/can run a custom build, are you in position to check if native libusb-1.0 support from some of the competing branches in issue #300 https://github.com/networkupstools/nut/issues/300 would behave better? I just tried a few times to compile the branches "libusb-1.0+0.1", "libusb-1.0" and "libusb-compat-1.0" but with no full success: The branches "libusb-1.0+0.1" and "libusb-1.0" I could compile but were unable to use them. I always get the following error message:
upsdrvctl[44892]: Failed to read pid from /var/state/ups/usbhid-ups1-myups.pid upsdrvctl[44892]: Network UPS Tools - Generic HID driver 0.42 (2.7.4-420-g2999c95f) upsdrvctl[44892]: USB communication driver (libusb 1.0) 0.02 upsdrvctl[44892]: No matching HID UPS found upsdrvctl[44892]: Driver failed to start (exit status=1) upsmon[18049]: Poll UPS @.*** failed - Driver not connected
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/networkupstools/nut/issues/1022#issuecomment-824396176, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPTFBGJEM4EYPAMYYYDFTTJ5FT3ANCNFSM43LALQWA .
After I called configure Debian conform the library could be compiled and used. I will now observe the whole thing for a quarter. If it doesn't hang anymore, the ticket can be closed.
I will inform you in this issue.
thanks!
On Fri, Apr 23, 2021, 20:00 HyP3r @.***> wrote:
After I called configure Debian conform the library could be compiled and used. I will now observe the whole thing for a quarter. If it doesn't hang anymore, the ticket can be closed.
I will inform you in this issue.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/networkupstools/nut/issues/1022#issuecomment-825825532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPTFHCF5A2V3WEJGDDEHTTKGYTVANCNFSM43LALQWA .
OK, I hit this issue, a reboot "fixed" it but I gathered some info before the reboot , might help somebody, esp if the real fix is to raise a kernel defect. (BTW this happened about 10 days after a real power cut...that would have been annoying if it had failed):
The messages:
Feb 25 14:06:57 ybox usbhid-ups[727]: libusb_get_interrupt: could not claim interface 0: Device or resource busy
Feb 25 14:07:07 ybox usbhid-ups[727]: libusb_get_interrupt: error sending control message: Protocol error
Feb 25 14:07:17 ybox usbhid-ups[727]: libusb_get_interrupt: error sending control message: Protocol error
... 33K of similar messages
Stop and restart nut-server (debian bullseye)
root@ybox:/var/log# systemctl start nut-server
root@ybox:/var/log# systemctl status nut-server
● nut-server.service - Network UPS Tools - power devices information server
Loaded: loaded (/lib/systemd/system/nut-server.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2024-02-29 10:50:45 GMT; 38s ago
Process: 1066411 ExecStart=/sbin/upsd (code=exited, status=0/SUCCESS)
Main PID: 1066412 (upsd)
Tasks: 1 (limit: 4476)
Memory: 772.0K
CPU: 33ms
CGroup: /system.slice/nut-server.service
└─1066412 /lib/nut/upsd
Feb 29 10:50:45 ybox upsd[1066411]: listening on 127.0.0.1 port 3493
Feb 29 10:50:45 ybox upsd[1066411]: listening on 10.117.0.152 port 3493
Feb 29 10:50:45 ybox upsd[1066411]: listening on 127.0.0.1 port 3493
Feb 29 10:50:45 ybox upsd[1066411]: Can't connect to UPS [qnapups] (usbhid-ups-qnapups): No such file or directory
Feb 29 10:50:45 ybox upsd[1066411]: Can't connect to UPS [qnapups] (usbhid-ups-qnapups): No such file or directory
Feb 29 10:50:45 ybox upsd[1066411]: /etc/nut/upsd.users is world readable
Feb 29 10:50:45 ybox upsd[1066411]: /etc/nut/upsd.users is world readable
Feb 29 10:50:45 ybox upsd[1066412]: Startup successful
Feb 29 10:50:45 ybox systemd[1]: Started Network UPS Tools - power devices information server.
Feb 29 10:50:50 ybox upsd[1066412]: User [email protected] logged into UPS [qnapups]
The driver failed to start:
root@ybox:/var/log# /lib/nut/usbhid-ups -DDD -a qnapups -u nut
....
0.032911 Checking device (051D/0002) (001/002)
0.034470 - VendorID: 051d
0.034502 - ProductID: 0002
0.034516 - Manufacturer: unknown
0.034529 - Product: unknown
0.034542 - Serial Number: unknown
0.034554 - Bus: 001
0.034566 - Device release number: 0106
0.034586 Trying to match device
0.034605 device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
0.034641 Device matches
0.034703 nut_usb_set_altinterface: skipped usb_set_altinterface(udev, 0)
0.035360 Unable to get HID descriptor (error sending control message: Protocol error)
0.035391 HID descriptor length (method 1) -1
0.035419 HID descriptor, method 2: (9 bytes) => 09 21 10 01 21 01 22 05 04
0.035440 HID descriptor length (method 2) 1029
0.035502 HID descriptor length 1029
0.039396 Unable to get Report descriptor: Protocol error
...
So it thinks Product is NULL ...
root@ybox:/var/log# lsusb -d 051d:
Bus 001 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
Bus 001 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 0
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 8
idVendor 0x051d American Power Conversion
idProduct 0x0002 Uninterruptible Power Supply
bcdDevice 1.06
iManufacturer 3 American Power Conversion
iProduct 1 Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5
iSerial 2 3B2041X34361
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x0022
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xe0
Self Powered
Remote Wakeup
MaxPower 24mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 3 Human Interface Device
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
HID Device Descriptor:
bLength 9
bDescriptorType 33
bcdHID 1.10
bCountryCode 33 US
bNumDescriptors 1
bDescriptorType 34 Report
wDescriptorLength 1029
Report Descriptor: (length is -1)
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0006 1x 6 bytes
bInterval 10
can't get debug descriptor: Resource temporarily unavailable
cannot read device status, Resource temporarily unavailable (11)
(BTW that last line may be key)
Looking at:
root@ybox:/var/log# cat /sys/bus/usb/devices/1-1/product
Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5
It appears to be there.
(also did this as user NUT also OK)
Will reboot ..
Following reboot ...all seemed to come up OK (UPS WAS NOT power cycled)
I noted the following diffs in the output of lsusb:
Before Reboot:
Report Descriptor: (length is -1)
...
can't get debug descriptor: Resource temporarily unavailable
cannot read device status, Resource temporarily unavailable (11)
After Reboot:
Report Descriptors:
UNAVAILABLE
can't get debug descriptor: Resource temporarily unavailable
**Device Status: 0x0000 **
(Bus Powered)
Same problem, it works initially and it stops reading the UPS after a while, I tried every possible parameter change, it still happens. Is there a way to tell NUT to restart after 15 minutes of activity as a workaround?
OK, the problem appears fixed for me at least:
Following my post I added a cronjob for root:
08 09 * * * chronic /usr/local/bin/check_ups
Where
$ cat /usr/local/bin/check_ups
#!/bin/bash
# Who Date What
# GPV 09Oct24 Inital version , greps lsusb(1) and nut-scanner(1) output
# See: https://unix.stackexchange.com/questions/340718/how-do-i-bring-heredoc-text-into-a-shell-script-variable
eat_here() {
res=$(cat)
}
# Make sure these have blank lines at start and end
eat_here <<'HEREDOC'
root@zbox:/home/graeme# lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 005: ID 2148:7022 USB2.0 HUB
Bus 001 Device 004: ID 8087:07dc Intel Corp. Bluetooth wireless interface
Bus 001 Device 003: ID 0bda:0153 Realtek Semiconductor Corp. 3-in-1 (SD/SDHC/SDXC) Card Reader
Bus 001 Device 002: ID 1a40:0101 Terminus Technology Inc. Hub
Bus 001 Device 047: ID 051d:0002 American Power Conversion Uninterruptible Power Supply <----- here
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
HEREDOC
expected_lsusb="${res}"
eat_here <<'HEREDOC'
graeme@zbox:~$ sudo nut-scanner -U
[sudo] password for graeme:
Scanning USB bus.
[nutdev1]
driver = "usbhid-ups"
port = "auto"
vendorid = "051D"
productid = "0002"
product = "Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5"
serial = "3B2041X34361"
vendor = "American Power Conversion"
bus = "001"
HEREDOC
expected_nut_scanner="${res}"
actual_lsusb="$(lsusb)"
actual_nut_scanner="$(nut-scanner -U)"
if [[ -z "${actual_lsusb}" ]] ; then
actual_lsusb="<NOTHING>"
fi
if [[ -z "${actual_nut_scanner}" ]] ; then
actual_nut_scanner="<NOTHING>"
fi
declare -i OK
declare -i RC
OK=1
if [[ "${actual_lsusb}" =~ "051d:0002" ]] ; then
:
else
OK=0
fi
if [[ "${actual_nut_scanner}" =~ "3B2041X34361" ]] ; then
:
else
OK=0
fi
if [[ ${OK} -eq 1 ]]; then
echo "check_ups: all is OK"
RC=0
else
msg="Error WRT UPS: lsusb reports\n ${actual_lsusb}\n we expect\n ${expected_lsusb}\n\n and nut-scanner reports \n ${actual_nut_scanner} \n and we expected \n ${expected_nut_scanner}"
printf "$msg"
printf "$msg" | mail -s "check_ups: Lost contact with UPS $(hostname)" graeme@home
RC=1
fi
exit ${RC}
After a few weeks, the email triggered. The device had just disappeared. I was on a nearby computer at the time, nothing unusual had happened, no power blip, no noise from the UPS.
I did some investigation and it seemed ALL the devices on the USB hub where the UPS was attached had gone (IIRC , so had the hub itself). It was an external USB 2.0 hub. Unplugging and re-plugging "fixed" the problem ...but of no long term use.
So I bought a uGreen USB 3.0 USB hub (generally pleased with their kit) and the problem has not reoccurred in several months.
@john8329 : which NUT version is it in your case? Also which UPS/driver?
There were some changes about auto-restart of connections with usbhid-ups in 2.8.0 (IIRC) and maybe more in later releases. Various appliances and distros often ship much older versions (including a decade-old NUT v2.7.4) - that we can not do much about.
I would suggest https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests to check if the newer code base has the problem already figured out and fixed. Even if not - it would give us a starting point to iterate on more investigation and possibly a fix.
@jimklimov Using the "NUT" add-on for Home Assistant, v0.14.1, UPS driver controller 2.8.0. I'm reading that from logs. Building the new package in the raspberry it's running on is a little complicated (it communicates over USB, can't forward that to a dev machine), so for now it's not worth the time for me (I already spent at least 8 hours on trying to make it work, I'm done with this). If you need detailed logs or any debug info feel free to ask though.
FWIW, you can check the wiki, there were community-contributed notes on custom-building the HA plugin...