wcwidth icon indicating copy to clipboard operation
wcwidth copied to clipboard

Python library that measures the width of unicode strings rendered to a terminal

|pypi_downloads| |codecov| |license|

============ Introduction

This library is mainly for CLI programs that carefully produce output for Terminals, or make pretend to be an emulator.

Problem Statement: The printable length of most strings are equal to the number of cells they occupy on the screen 1 charater : 1 cell. However, there are categories of characters that occupy 2 cells (full-wide), and others that occupy 0 cells (zero-width).

Solution: POSIX.1-2001 and POSIX.1-2008 conforming systems provide wcwidth(3)_ and wcswidth(3)_ C functions of which this python module's functions precisely copy. These functions return the number of cells a unicode string is expected to occupy.

Installation

The stable version of this package is maintained on pypi, install using pip::

pip install wcwidth

Example

Problem: given the following phrase (Japanese),

text = u'コンニチハ'

Python incorrectly uses the string length of 5 codepoints rather than the printible length of 10 cells, so that when using the rjust function, the output length is wrong::

>>> print(len('コンニチハ'))
5

>>> print('コンニチハ'.rjust(20, '_'))
_______________コンニチハ

By defining our own "rjust" function that uses wcwidth, we can correct this::

def wc_rjust(text, length, padding=' '): ... from wcwidth import wcswidth ... return padding * max(0, (length - wcswidth(text))) + text ...

Our Solution uses wcswidth to determine the string length correctly::

from wcwidth import wcswidth print(wcswidth('コンニチハ')) 10

print(wc_rjust('コンニチハ', 20, '_')) __________コンニチハ

Choosing a Version

Export an environment variable, UNICODE_VERSION. This should be done by terminal emulators or those developers experimenting with authoring one of their own, from shell::

$ export UNICODE_VERSION=13.0

If unspecified, the latest version is used. If your Terminal Emulator does not export this variable, you can use the jquast/ucs-detect_ utility to automatically detect and export it to your shell.

wcwidth, wcswidth

Use function wcwidth() to determine the length of a single unicode character, and wcswidth() to determine the length of many, a string of unicode characters.

Briefly, return values of function wcwidth() are:

-1 Indeterminate (not printable).

0 Does not advance the cursor, such as NULL or Combining.

2 Characters of category East Asian Wide (W) or East Asian Full-width (F) which are displayed using two terminal cells.

1 All others.

Function wcswidth() simply returns the sum of all values for each character along a string, or -1 when it occurs anywhere along a string.

Full API Documentation at http://wcwidth.readthedocs.org

========== Developing

Install wcwidth in editable mode::

pip install -e.

Execute unit tests using tox_::

tox

Regenerate python code tables from latest Unicode Specification data files::

tox -eupdate

Supplementary tools for browsing and testing terminals for wide unicode characters are found in the bin/_ of this project's source code. Just ensure to first pip install -erequirements-develop.txt from this projects main folder. For example, an interactive browser for testing::

./bin/wcwidth-browser.py

Uses

This library is used in:

  • jquast/blessed_: a thin, practical wrapper around terminal capabilities in Python.

  • jonathanslenders/python-prompt-toolkit_: a Library for building powerful interactive command lines in Python.

  • dbcli/pgcli_: Postgres CLI with autocompletion and syntax highlighting.

  • thomasballinger/curtsies_: a Curses-like terminal wrapper with a display based on compositing 2d arrays of text.

  • selectel/pyte_: Simple VTXXX-compatible linux terminal emulator.

  • astanin/python-tabulate_: Pretty-print tabular data in Python, a library and a command-line utility.

  • LuminosoInsight/python-ftfy_: Fixes mojibake and other glitches in Unicode text.

  • nbedos/termtosvg_: Terminal recorder that renders sessions as SVG animations.

  • peterbrittain/asciimatics_: Package to help people create full-screen text UIs.

Other Languages

  • timoxley/wcwidth_: JavaScript
  • janlelis/unicode-display_width_: Ruby
  • alecrabbit/php-wcwidth_: PHP
  • Text::CharWidth_: Perl
  • bluebear94/Terminal-WCWidth: Perl 6
  • mattn/go-runewidth_: Go
  • emugel/wcwidth_: Haxe
  • aperezdc/lua-wcwidth: Lua
  • joachimschmidt557/zig-wcwidth: Zig
  • fumiyas/wcwidth-cjk: LD_PRELOAD override
  • joshuarubin/wcwidth9: Unicode version 9 in C

History

0.2.0 2020-06-01

  • Enhancement: Unicode version may be selected by exporting the Environment variable UNICODE_VERSION, such as 13.0, or 6.3.0. See the jquast/ucs-detect_ CLI utility for automatic detection.
  • Enhancement: API Documentation is published to readthedocs.org.
  • Updated tables for all Unicode Specifications with files published in a programmatically consumable format, versions 4.1.0 through 13.0 that are published , versions

0.1.9 2020-03-22

  • Performance optimization by Avram Lubkin, PR #35.
  • Updated tables to Unicode Specification 13.0.0.

0.1.8 2020-01-01

  • Updated tables to Unicode Specification 12.0.0. (PR #30_).

0.1.7 2016-07-01

  • Updated tables to Unicode Specification 9.0.0. (PR #18_).

0.1.6 2016-01-08 Production/Stable

  • LICENSE file now included with distribution.

0.1.5 2015-09-13 Alpha

  • Bugfix: Resolution of "combining_ character width" issue, most especially those that previously returned -1 now often (correctly) return 0. resolved by Philip Craig_ via PR #11_.
  • Deprecated: The module path wcwidth.table_comb is no longer available, it has been superseded by module path wcwidth.table_zero.

0.1.4 2014-11-20 Pre-Alpha

  • Feature: wcswidth() now determines printable length for (most) combining_ characters. The developer's tool bin/wcwidth-browser.py_ is improved to display combining_ characters when provided the --combining option (Thomas Ballinger_ and Leta Montopoli_ PR #5_).
  • Feature: added static analysis (prospector_) to testing framework.

0.1.3 2014-10-29 Pre-Alpha

  • Bugfix: 2nd parameter of wcswidth was not honored. (Thomas Ballinger, PR #4).

0.1.2 2014-10-28 Pre-Alpha

  • Updated tables to Unicode Specification 7.0.0. (Thomas Ballinger, PR #3).

0.1.1 2014-05-14 Pre-Alpha

  • Initial release to pypi, Based on Unicode Specification 6.3.0

This code was originally derived directly from C code of the same name, whose latest version is available at http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::

  • Markus Kuhn -- 2007-05-26 (Unicode 5.0)
  • Permission to use, copy, modify, and distribute this software
  • for any purpose and without fee is hereby granted. The author
  • disclaims all warranties with regard to this software.

.. _tox: https://testrun.org/tox/latest/install.html .. _prospector: https://github.com/landscapeio/prospector .. _combining: https://en.wikipedia.org/wiki/Combining_character .. _bin/: https://github.com/jquast/wcwidth/tree/master/bin .. _bin/wcwidth-browser.py: https://github.com/jquast/wcwidth/tree/master/bin/wcwidth-browser.py .. _Thomas Ballinger: https://github.com/thomasballinger .. _Leta Montopoli: https://github.com/lmontopo .. _Philip Craig: https://github.com/philipc .. _PR #3: https://github.com/jquast/wcwidth/pull/3 .. _PR #4: https://github.com/jquast/wcwidth/pull/4 .. _PR #5: https://github.com/jquast/wcwidth/pull/5 .. _PR #11: https://github.com/jquast/wcwidth/pull/11 .. _PR #18: https://github.com/jquast/wcwidth/pull/18 .. _PR #30: https://github.com/jquast/wcwidth/pull/30 .. _PR #35: https://github.com/jquast/wcwidth/pull/35 .. _jquast/blessed: https://github.com/jquast/blessed .. _selectel/pyte: https://github.com/selectel/pyte .. _thomasballinger/curtsies: https://github.com/thomasballinger/curtsies .. _dbcli/pgcli: https://github.com/dbcli/pgcli .. _jonathanslenders/python-prompt-toolkit: https://github.com/jonathanslenders/python-prompt-toolkit .. _timoxley/wcwidth: https://github.com/timoxley/wcwidth .. _wcwidth(3): http://man7.org/linux/man-pages/man3/wcwidth.3.html .. _wcswidth(3): http://man7.org/linux/man-pages/man3/wcswidth.3.html .. _astanin/python-tabulate: https://github.com/astanin/python-tabulate .. _janlelis/unicode-display_width: https://github.com/janlelis/unicode-display_width .. _LuminosoInsight/python-ftfy: https://github.com/LuminosoInsight/python-ftfy .. _alecrabbit/php-wcwidth: https://github.com/alecrabbit/php-wcwidth .. _Text::CharWidth: https://metacpan.org/pod/Text::CharWidth .. _bluebear94/Terminal-WCWidth: https://github.com/bluebear94/Terminal-WCWidth .. _mattn/go-runewidth: https://github.com/mattn/go-runewidth .. _emugel/wcwidth: https://github.com/emugel/wcwidth .. _jquast/ucs-detect: https://github.com/jquast/ucs-detect .. _Avram Lubkin: https://github.com/avylove .. _nbedos/termtosvg: https://github.com/nbedos/termtosvg .. _peterbrittain/asciimatics: https://github.com/peterbrittain/asciimatics .. _aperezdc/lua-wcwidth: https://github.com/aperezdc/lua-wcwidth .. _fumiyas/wcwidth-cjk: https://github.com/fumiyas/wcwidth-cjk .. |pypi_downloads| image:: https://img.shields.io/pypi/dm/wcwidth.svg?logo=pypi :alt: Downloads :target: https://pypi.org/project/wcwidth/ .. |codecov| image:: https://codecov.io/gh/jquast/wcwidth/branch/master/graph/badge.svg :alt: codecov.io Code Coverage :target: https://codecov.io/gh/jquast/wcwidth/ .. |license| image:: https://img.shields.io/github/license/jquast/wcwidth.svg :target: https://pypi.python.org/pypi/wcwidth/ :alt: MIT License