wcwidth
wcwidth copied to clipboard
wc_rjust() doesn't work for non-printables
Firstly, this isn't a problem for me, I just wanted to let you know about it.
Using wc_rjust() from the readme, if text contains any non-printable characters, the result is longer than length, which should never happen.
For example, '\n' is non-printable:
>>> wc_rjust('\n', 2, '.')
'...\n'
For reference, the width-naive version:
>>> '\n'.rjust(2, '.')
'.\n'
The problem is because of the math here:
max(0, (length - wcswidth(text)))
If wcswidth(text) is negative, the max is length + 1.
The simple solution is to just add a note in the readme warning about this situation, but if you wanted, you could expand the function to raise an error:
>>> def wc_rjust(text, length, padding=' '):
... from wcwidth import wcswidth
... width = wcswidth(text)
... if width < 0:
... raise ValueError('text contains non-printable characters')
... return padding * max(0, (length - width)) + text
...
>>> wc_rjust('\n', 2, '.')
Traceback (most recent call last):
...
ValueError: text contains non-printable characters
Ultimately, it seems like the problem is using -1 as a sentinel return value instead of raising an error, but it looks like that's inherited from the C function, so fixing that would be a lot of work.
I was thinking of a new function wcwidth.length() in next release that just "does its best", but never returns -1. In this case, string containing \n would measure as 0, not -1, and not raise any exception. What do you think of that?
Issue #79 proposes new width() function
See also #93 to just offer a proper rjust/etc. as part of the API
I just want to also add that this cannot be fixed in the wcwidth() and wcswidth() functions, as they intend to exactly match function signature and behavior of the POSIX functions.
The reason that C0 and C1 control characters return -1, is that the intended application, a terminal emulator especially, should handle these characters in a stream and remove them from the string before passing on to wcswidth. Especially items like \n, \b, and \t. They become complicated, it depends on the current position of the cursor, and also terminal settings, for example \b can wrap to previous row if it is located at column 0, and the number of spaces incurred by '\t' are dependent on the tab stop setting and the current cursor position. C1 characters like '\x1b' may begin a terminal escape sequence, and that too should be processed before sending to wcswidth, etc.
So I won't really consider this a bug, but a feature request for an alternate function that simply ignores these things. Although these characters should not be sent to wcswidth, developers are free to have them silently ignored. I would suggest all control characters have width of 0 for a new function, wcwidth.width