users: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 25: invalid start byte
This issue respects the following points:
- [X] This is a bug, not a question or a setup/configuration issue.
- [X] This issue is not already reported on Github (I've searched it).
- [X] I use the latest release of the Monitoring Plugins (https://github.com/Linuxfabrik/monitoring-plugins/releases).
- [X] I agree to follow Monitoring Plugins's Code of Conduct.
Which variant of the Monitoring Plugins do you use?
- [ ] .rpm/.deb package from repo.linuxfabrik.ch
- [ ] Compiled for Linux (.tar/.zip from download.linuxfabrik.ch)
- [X] Compiled for Windows (from download.linuxfabrik.ch)
- [ ] Source Code from GitHub
Bug description
Sometimes, users throws a traceback if nobody is logged in.
Traceback (most recent call last):
File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\users.py", line 172, in 'module'
File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\users.py", line 129, in main
File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\lib\base3.py", line 954, in shell_exec
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 25: invalid start byte
Steps to reproduce - Plugin call
'C:\ProgramData\icinga2\usr\lib64\nagios\plugins\users\users.exe' '--critical' 'None, None, None' '--warning' '1, 20, 1'
Steps to reproduce - Data
No response
Environment
092 win11
Plugin Version
users: v2022021603 by Linuxfabrik GmbH, Zurich/Switzerland
Python version
No response
List of Python modules
No response
Additional Information
No response
Also happens with users: v2023051201 by Linuxfabrik GmbH, Zurich/Switzerland:
Traceback (most recent call last):
File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\users.py", line 172, in 'module'
File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\users.py", line 129, in main
File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\lib\base3.py", line 954, in shell_exec
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 25: invalid start byte
I also see a similar error with our own logtime check but had no time to debug yet.
we are lucky and I could capture a file that provokes the error ATM and because I used the Linuxfabrik libs and the general structure of the plugins in this repository it could help fixing the problem.
On Windows, I don't understand it, so I can't make it work. What I run and what I do:
- Windows Server 2019
- cmd.exe terminal
chcp': results inActive code page: 437`- Add user
müller. query userreturns... >müller ...- Running our
users.exe, I get... >m�ller ... - Debugging our
lib.shell.shell_exec(), the call toquery usersreturns a byte object (which is fine), and we do a properto_text()decoding toutf-8(which should also be fine). Python 3.6+ should handle the encoding/decoding correctly. I tested it with Python 3.12 on Windows, no luck. Every time I try to decode/encode by hand, I get the aforementioned "can't decode" errors.
What do we need to do to make this work better on Windows? Any feedback would be appreciated.
Well I guess the problem is that 0xfc isn't utf8 but most likely Windows-1252 and as such not a legal code at this position as 0x00fc would be legal utf-16 and 0xC3 0xBC would be proper utf8 ü. What is https://pypi.org/project/chardet/ detecting?
Well I guess the problem is that 0xfc isn't utf8 but most likely Windows-1252 and as such not a legal code at this position as 0x00fc would be legal utf-16 and 0xC3 0xBC would be proper utf8 ü. What is https://pypi.org/project/chardet/ detecting?
There are two different codec problems in this issue:
- Anything related to stdin/stdout and talking to the Windows console (e.g. happening in the users plugin).
- Encoding issues when reading files on Windows (which require a slightly different handling).
Regarding chardet, I would like to follow standards and respect explicit character encoding information instead of guessing it. I have more of a problem on how to do the encoding/decoding in a robust way.
Agreed, I suggested chardet for debugging and guessing not for incorporating.
I think Windows uses UTF-16 internally and maybe requesting specific encodings it the way to go.
Oh, the mess:
https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/ https://peps.python.org/pep-0528/ https://docs.python.org/3/library/sys.html#sys.stdout
On Windows, UTF-8 is used for the console device. Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage). Non-console character devices such as NUL (i.e. where isatty() returns True) use the value of the console input and output codepages at startup, respectively for stdin and stdout/stderr. This defaults to the system locale encoding if the process is not initially attached to a console.