monitoring-plugins users: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 25: invalid start byte

This issue respects the following points:

[X] This is a bug, not a question or a setup/configuration issue.
[X] This issue is not already reported on Github (I've searched it).
[X] I use the latest release of the Monitoring Plugins (https://github.com/Linuxfabrik/monitoring-plugins/releases).
[X] I agree to follow Monitoring Plugins's Code of Conduct.

Which variant of the Monitoring Plugins do you use?

[ ] .rpm/.deb package from repo.linuxfabrik.ch
[ ] Compiled for Linux (.tar/.zip from download.linuxfabrik.ch)
[X] Compiled for Windows (from download.linuxfabrik.ch)
[ ] Source Code from GitHub

Bug description

Sometimes, users throws a traceback if nobody is logged in.

Traceback (most recent call last):
File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\users.py", line 172, in 'module'

  File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\users.py", line 129, in main

  File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\lib\base3.py", line 954, in shell_exec

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 25: invalid start byte

Steps to reproduce - Plugin call

'C:\ProgramData\icinga2\usr\lib64\nagios\plugins\users\users.exe' '--critical' 'None, None, None' '--warning' '1, 20, 1'

Steps to reproduce - Data

No response

Environment

092 win11

Plugin Version

users: v2022021603 by Linuxfabrik GmbH, Zurich/Switzerland

Python version

No response

List of Python modules

No response

Additional Information

No response

Jun 05 '23 15:06 markuslf

Also happens with users: v2023051201 by Linuxfabrik GmbH, Zurich/Switzerland:

Traceback (most recent call last):
File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\users.py", line 172, in 'module'

  File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\users.py", line 129, in main

  File "C:\PROGRA~3\icinga2\usr\lib64
agios\plugins\users\lib\base3.py", line 954, in shell_exec

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 25: invalid start byte

Jun 05 '23 15:06 markuslf

I also see a similar error with our own logtime check but had no time to debug yet. we are lucky and I could capture a file that provokes the error ATM and because I used the Linuxfabrik libs and the general structure of the plugins in this repository it could help fixing the problem.

Jun 06 '23 06:06 slalomsk8er

On Windows, I don't understand it, so I can't make it work. What I run and what I do:

Windows Server 2019
cmd.exe terminal
chcp': results in Active code page: 437`
Add user müller.
query user returns ... >müller ...
Running our users.exe, I get ... >m�ller ...
Debugging our lib.shell.shell_exec(), the call to query users returns a byte object (which is fine), and we do a proper to_text() decoding to utf-8 (which should also be fine). Python 3.6+ should handle the encoding/decoding correctly. I tested it with Python 3.12 on Windows, no luck. Every time I try to decode/encode by hand, I get the aforementioned "can't decode" errors.

What do we need to do to make this work better on Windows? Any feedback would be appreciated.

Nov 27 '23 13:11 markuslf

Well I guess the problem is that 0xfc isn't utf8 but most likely Windows-1252 and as such not a legal code at this position as 0x00fc would be legal utf-16 and 0xC3 0xBC would be proper utf8 ü. What is https://pypi.org/project/chardet/ detecting?

Nov 27 '23 17:11 slalomsk8er

Well I guess the problem is that 0xfc isn't utf8 but most likely Windows-1252 and as such not a legal code at this position as 0x00fc would be legal utf-16 and 0xC3 0xBC would be proper utf8 ü. What is https://pypi.org/project/chardet/ detecting?

There are two different codec problems in this issue:

Anything related to stdin/stdout and talking to the Windows console (e.g. happening in the users plugin).
Encoding issues when reading files on Windows (which require a slightly different handling).

Regarding chardet, I would like to follow standards and respect explicit character encoding information instead of guessing it. I have more of a problem on how to do the encoding/decoding in a robust way.

Nov 27 '23 21:11 markuslf

Agreed, I suggested chardet for debugging and guessing not for incorporating.

I think Windows uses UTF-16 internally and maybe requesting specific encodings it the way to go.

Oh, the mess:

https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/ https://peps.python.org/pep-0528/ https://docs.python.org/3/library/sys.html#sys.stdout

On Windows, UTF-8 is used for the console device. Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage). Non-console character devices such as NUL (i.e. where isatty() returns True) use the value of the console input and output codepages at startup, respectively for stdin and stdout/stderr. This defaults to the system locale encoding if the process is not initially attached to a console.

Nov 27 '23 22:11 slalomsk8er