Peter Olson

Results 37 comments of Peter Olson
trafficstars

`zh_core_web_trf` is not detecting sentence boundaries correctly in Chinese. ``` nlp = spacy.load("zh_core_web_trf") doc = nlp("我是你的朋友。你是我的朋友吗?我不喜欢喝咖啡。") ``` This should be three separate sentence, but the `sents` property only has one...

Spanish tokenization is broken when there is no space between question sentences "?¿" ``` nlp = spacy.load("es_dep_news_trf") doc = nlp("¿Qué quieres?¿Por qué estás aquí?") ``` `quieres?¿Por` is treated as one...

Aa now I understand the cause of the issue. Both the user and I assumed that the stroke would go from left-to-right. I guess technically it works as designed, although...

This can be changed with the [`drawingWidth` property.](https://hanziwriter.org/docs.html#api-link)

Same issue with 肠

There's been issues open on the makemeahanzi for a while already. https://github.com/skishore/makemeahanzi/issues/95 https://github.com/skishore/makemeahanzi/issues/96 If you want to patch this, here is the correct stroke data for 翰 ``` {"strokes":["M 317...

From what I understand, the [Inkstone allows for some shortcuts](https://github.com/skishore/inkstone/blob/master/lib/matcher/shortcuts.js) for some common character components, but I'm not familiar enough with the code to understand exactly how it works. Would...

You can convert between simplified and traditional, but the segmentation will only work well with simplified. If you want to segment traditional text, you can convert to simplified, segment, and...

It probably won't work very well for traditional characters because the segmentation library used (jieba) is trained on simplified texts. For now you'll probably have to convert to simplified first.

The dictionary used is CC-CEDICT and whatever [node-pinyin](https://github.com/godfox2012/node-pinyin) uses behind the scenes. I'm not sure exactly how many characters are covered, I'll have to investigate this later.