budou
budou copied to clipboard
Budou is an automatic organizer tool for beautiful line breaking in CJK (Chinese, Japanese, and Korean).
Here is the problem string: `Chatbot\u00a0\u2013 ` ``` Traceback (most recent call last): File "", line 5, in File "/usr/local/lib/python3.6/site-packages/budou/parser.py", line 78, in parse chunks = self.segmenter.segment(source, language) File "/usr/local/lib/python3.6/site-packages/budou/tinysegmentersegmenter.py",...
example input: 今日は [@foo]tushuhei.com/hoge[/@foo]天気です。 output: ``` 今日は [@foo]tushuhei.com/hoge[/@foo]天気です。 ``` It seems characters like [, @ are included in a chunk by error.
The current implementation keeps returning the warning below. ``` DeprecationWarning: This method will be removed in future versions. Use 'list(elem)' or iteration over elem instead. ```
Add [Jieba](https://github.com/fxsjy/jieba) backend segmenter to add another segmenter option for Chinese.
When trying to setup a11y with budou, setting `aria-describedby` as an attribute is being stripped by html5lib on line https://github.com/google/budou/blob/master/budou/budou.py#L439