wcwidth
wcwidth copied to clipboard
Python library that measures the width of unicode strings rendered to a terminal
|pypi_downloads| |codecov| |license|
============ Introduction
This library is mainly for CLI programs that carefully produce output for Terminals, or make pretend to be an emulator.
Problem Statement: The printable length of most strings are equal to the
number of cells they occupy on the screen 1 charater : 1 cell. However,
there are categories of characters that occupy 2 cells (full-wide), and
others that occupy 0 cells (zero-width).
Solution: POSIX.1-2001 and POSIX.1-2008 conforming systems provide
wcwidth(3)_ and wcswidth(3)_ C functions of which this python module's
functions precisely copy. These functions return the number of cells a
unicode string is expected to occupy.
Installation
The stable version of this package is maintained on pypi, install using pip::
pip install wcwidth
Example
Problem: given the following phrase (Japanese),
text = u'コンニチハ'
Python incorrectly uses the string length of 5 codepoints rather than the
printible length of 10 cells, so that when using the rjust function, the
output length is wrong::
>>> print(len('コンニチハ'))
5
>>> print('コンニチハ'.rjust(20, '_'))
_______________コンニチハ
By defining our own "rjust" function that uses wcwidth, we can correct this::
def wc_rjust(text, length, padding=' '): ... from wcwidth import wcswidth ... return padding * max(0, (length - wcswidth(text))) + text ...
Our Solution uses wcswidth to determine the string length correctly::
from wcwidth import wcswidth print(wcswidth('コンニチハ')) 10
print(wc_rjust('コンニチハ', 20, '_')) __________コンニチハ
Choosing a Version
Export an environment variable, UNICODE_VERSION. This should be done by
terminal emulators or those developers experimenting with authoring one of
their own, from shell::
$ export UNICODE_VERSION=13.0
If unspecified, the latest version is used. If your Terminal Emulator does not
export this variable, you can use the jquast/ucs-detect_ utility to
automatically detect and export it to your shell.
wcwidth, wcswidth
Use function wcwidth() to determine the length of a single unicode
character, and wcswidth() to determine the length of many, a string
of unicode characters.
Briefly, return values of function wcwidth() are:
-1
Indeterminate (not printable).
0
Does not advance the cursor, such as NULL or Combining.
2
Characters of category East Asian Wide (W) or East Asian
Full-width (F) which are displayed using two terminal cells.
1
All others.
Function wcswidth() simply returns the sum of all values for each character
along a string, or -1 when it occurs anywhere along a string.
Full API Documentation at http://wcwidth.readthedocs.org
========== Developing
Install wcwidth in editable mode::
pip install -e.
Execute unit tests using tox_::
tox
Regenerate python code tables from latest Unicode Specification data files::
tox -eupdate
Supplementary tools for browsing and testing terminals for wide unicode
characters are found in the bin/_ of this project's source code. Just ensure
to first pip install -erequirements-develop.txt from this projects main
folder. For example, an interactive browser for testing::
./bin/wcwidth-browser.py
Uses
This library is used in:
-
jquast/blessed_: a thin, practical wrapper around terminal capabilities in Python. -
jonathanslenders/python-prompt-toolkit_: a Library for building powerful interactive command lines in Python. -
dbcli/pgcli_: Postgres CLI with autocompletion and syntax highlighting. -
thomasballinger/curtsies_: a Curses-like terminal wrapper with a display based on compositing 2d arrays of text. -
selectel/pyte_: Simple VTXXX-compatible linux terminal emulator. -
astanin/python-tabulate_: Pretty-print tabular data in Python, a library and a command-line utility. -
LuminosoInsight/python-ftfy_: Fixes mojibake and other glitches in Unicode text. -
nbedos/termtosvg_: Terminal recorder that renders sessions as SVG animations. -
peterbrittain/asciimatics_: Package to help people create full-screen text UIs.
Other Languages
timoxley/wcwidth_: JavaScriptjanlelis/unicode-display_width_: Rubyalecrabbit/php-wcwidth_: PHPText::CharWidth_: Perlbluebear94/Terminal-WCWidth: Perl 6mattn/go-runewidth_: Goemugel/wcwidth_: Haxeaperezdc/lua-wcwidth: Luajoachimschmidt557/zig-wcwidth: Zigfumiyas/wcwidth-cjk:LD_PRELOADoverridejoshuarubin/wcwidth9: Unicode version 9 in C
History
0.2.0 2020-06-01
- Enhancement: Unicode version may be selected by exporting the
Environment variable
UNICODE_VERSION, such as13.0, or6.3.0. See thejquast/ucs-detect_ CLI utility for automatic detection. - Enhancement: API Documentation is published to readthedocs.org.
- Updated tables for all Unicode Specifications with files published in a programmatically consumable format, versions 4.1.0 through 13.0 that are published , versions
0.1.9 2020-03-22
- Performance optimization by
Avram Lubkin,PR #35. - Updated tables to Unicode Specification 13.0.0.
0.1.8 2020-01-01
- Updated tables to Unicode Specification 12.0.0. (
PR #30_).
0.1.7 2016-07-01
- Updated tables to Unicode Specification 9.0.0. (
PR #18_).
0.1.6 2016-01-08 Production/Stable
LICENSEfile now included with distribution.
0.1.5 2015-09-13 Alpha
- Bugfix:
Resolution of "combining_ character width" issue, most especially
those that previously returned -1 now often (correctly) return 0.
resolved by
Philip Craig_ viaPR #11_. - Deprecated:
The module path
wcwidth.table_combis no longer available, it has been superseded by module pathwcwidth.table_zero.
0.1.4 2014-11-20 Pre-Alpha
- Feature:
wcswidth()now determines printable length for (most) combining_ characters. The developer's toolbin/wcwidth-browser.py_ is improved to display combining_ characters when provided the--combiningoption (Thomas Ballinger_ andLeta Montopoli_PR #5_). - Feature: added static analysis (prospector_) to testing framework.
0.1.3 2014-10-29 Pre-Alpha
- Bugfix: 2nd parameter of wcswidth was not honored.
(
Thomas Ballinger,PR #4).
0.1.2 2014-10-28 Pre-Alpha
- Updated tables to Unicode Specification 7.0.0.
(
Thomas Ballinger,PR #3).
0.1.1 2014-05-14 Pre-Alpha
- Initial release to pypi, Based on Unicode Specification 6.3.0
This code was originally derived directly from C code of the same name, whose latest version is available at http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
- Markus Kuhn -- 2007-05-26 (Unicode 5.0)
- Permission to use, copy, modify, and distribute this software
- for any purpose and without fee is hereby granted. The author
- disclaims all warranties with regard to this software.
.. _tox: https://testrun.org/tox/latest/install.html
.. _prospector: https://github.com/landscapeio/prospector
.. _combining: https://en.wikipedia.org/wiki/Combining_character
.. _bin/: https://github.com/jquast/wcwidth/tree/master/bin
.. _bin/wcwidth-browser.py: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py
.. _Thomas Ballinger: https://github.com/thomasballinger
.. _Leta Montopoli: https://github.com/lmontopo
.. _Philip Craig: https://github.com/philipc
.. _PR #3: https://github.com/jquast/wcwidth/pull/3
.. _PR #4: https://github.com/jquast/wcwidth/pull/4
.. _PR #5: https://github.com/jquast/wcwidth/pull/5
.. _PR #11: https://github.com/jquast/wcwidth/pull/11
.. _PR #18: https://github.com/jquast/wcwidth/pull/18
.. _PR #30: https://github.com/jquast/wcwidth/pull/30
.. _PR #35: https://github.com/jquast/wcwidth/pull/35
.. _jquast/blessed: https://github.com/jquast/blessed
.. _selectel/pyte: https://github.com/selectel/pyte
.. _thomasballinger/curtsies: https://github.com/thomasballinger/curtsies
.. _dbcli/pgcli: https://github.com/dbcli/pgcli
.. _jonathanslenders/python-prompt-toolkit: https://github.com/jonathanslenders/python-prompt-toolkit
.. _timoxley/wcwidth: https://github.com/timoxley/wcwidth
.. _wcwidth(3): http://man7.org/linux/man-pages/man3/wcwidth.3.html
.. _wcswidth(3): http://man7.org/linux/man-pages/man3/wcswidth.3.html
.. _astanin/python-tabulate: https://github.com/astanin/python-tabulate
.. _janlelis/unicode-display_width: https://github.com/janlelis/unicode-display_width
.. _LuminosoInsight/python-ftfy: https://github.com/LuminosoInsight/python-ftfy
.. _alecrabbit/php-wcwidth: https://github.com/alecrabbit/php-wcwidth
.. _Text::CharWidth: https://metacpan.org/pod/Text::CharWidth
.. _bluebear94/Terminal-WCWidth: https://github.com/bluebear94/Terminal-WCWidth
.. _mattn/go-runewidth: https://github.com/mattn/go-runewidth
.. _emugel/wcwidth: https://github.com/emugel/wcwidth
.. _jquast/ucs-detect: https://github.com/jquast/ucs-detect
.. _Avram Lubkin: https://github.com/avylove
.. _nbedos/termtosvg: https://github.com/nbedos/termtosvg
.. _peterbrittain/asciimatics: https://github.com/peterbrittain/asciimatics
.. _aperezdc/lua-wcwidth: https://github.com/aperezdc/lua-wcwidth
.. _fumiyas/wcwidth-cjk: https://github.com/fumiyas/wcwidth-cjk
.. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi
:alt: Downloads
:target: https://pypi.org/project/wcwidth/
.. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg
:alt: codecov.io Code Coverage
:target: https://codecov.io/gh/jquast/wcwidth/
.. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg
:target: https://pypi.python.org/pypi/wcwidth/
:alt: MIT License