pyte icon indicating copy to clipboard operation
pyte copied to clipboard

Vendor wcwidth using python's unicodedata

Open deathaxe opened this issue 7 months ago • 3 comments

This commit replaces wcwidth dependency by a simple vendored module, leveraging python's built-in unicodedata.

Notes:

  1. wcwidth() function, provided by wcwidth library, is already decorated with lru_cache(100). Hence following line wraps lru-cached function into another duplicated lru-cache layer, which may cause significant overhead.

     wcwidth: Callable[[str], int] = lru_cache(maxsize=4096)(_wcwidth)
    
  2. performance of vendored wcwidth() function is more or less equal to that provided by wcwidth package.

  3. this change turns pyte into a self-contained library.

  4. only possible downside is supported unicode version being bound/limited to that of used python interpreter. But that's probably rather minor as the interpreter wouldn't be able to decode more recent unicode chars anyway.

Benchmarks:

>>> from timeit import timeit
>>> from wcwidth import wcswidth as wcswidth1
>>> from pyte.wcwidth import wcswidth as wcswidth2
>>> s = "开源的计算机代数系统 Maxima 是用于操纵符号和数值表达式的系统"
>>> timeit(lambda: wcswidth1(s))
7.851543699999999
>>> timeit(lambda: wcswidth2(s))
3.857342599999999

Credits:

The implementation is borrowed from pytest and slightly tweaked.

deathaxe avatar Apr 25 '25 09:04 deathaxe

I suspect you might need to keep the MIT license for that file, since the implementation follows the one in pytest.

superbobry avatar Apr 25 '25 13:04 superbobry

Can you rebase, please?

superbobry avatar Apr 27 '25 22:04 superbobry

If you're gonna open this can of worms, you should wash them down with a test that brute-forces the entire range. I don't fully understand this problem space myself, but I know it's not a nice one:

https://github.com/bitplane/unicode-width-mess

bitplane avatar Jun 10 '25 01:06 bitplane