Vendor wcwidth using python's unicodedata
This commit replaces wcwidth dependency by a simple vendored module, leveraging python's built-in unicodedata.
Notes:
-
wcwidth()function, provided by wcwidth library, is already decorated withlru_cache(100). Hence following line wraps lru-cached function into another duplicated lru-cache layer, which may cause significant overhead.wcwidth: Callable[[str], int] = lru_cache(maxsize=4096)(_wcwidth) -
performance of vendored
wcwidth()function is more or less equal to that provided bywcwidthpackage. -
this change turns pyte into a self-contained library.
-
only possible downside is supported unicode version being bound/limited to that of used python interpreter. But that's probably rather minor as the interpreter wouldn't be able to decode more recent unicode chars anyway.
Benchmarks:
>>> from timeit import timeit
>>> from wcwidth import wcswidth as wcswidth1
>>> from pyte.wcwidth import wcswidth as wcswidth2
>>> s = "开源的计算机代数系统 Maxima 是用于操纵符号和数值表达式的系统"
>>> timeit(lambda: wcswidth1(s))
7.851543699999999
>>> timeit(lambda: wcswidth2(s))
3.857342599999999
Credits:
The implementation is borrowed from pytest and slightly tweaked.
I suspect you might need to keep the MIT license for that file, since the implementation follows the one in pytest.
Can you rebase, please?
If you're gonna open this can of worms, you should wash them down with a test that brute-forces the entire range. I don't fully understand this problem space myself, but I know it's not a nice one:
https://github.com/bitplane/unicode-width-mess