MorphMan icon indicating copy to clipboard operation
MorphMan copied to clipboard

Add Vietnamese support using pyvi

Open kurtisc opened this issue 4 years ago • 4 comments

Hi!

Vietnamese doesn't separate words with spaces like most other languages that use the Latin alphabet[1], so the current spaces morphemizer is unsuitable.

[1] Fun read https://www.tandfonline.com/doi/pdf/10.1080/00437956.1963.11659787

I wasn't able to find a small library that would do word segmentation for Vietnamese like Jieba does for Chinese. To bundle pyvi in-code like Jieba has been bundled would require bundling many larger dependencies (e.g. Numpy).

So, if merged like this, it's unfortunately a burden on the end user to get the Vietnamese support working. On the other hand, if they don't want it, it won't appear or impact their usage.

If this gets included I'll look into packaging pyvi and it's dependencies as a separate addon like has been done for Mecab, licences permitting. That would make the installation more straight-forward and avoid forcing use of the source version of Anki.

kurtisc avatar Apr 21 '20 22:04 kurtisc

Rebased on master and confirmed working when #125 is merged.

With regards to #145: I do have a test for this morphemizer, so hopefully that fulfils @shanrauf's comment.

kurtisc avatar Aug 15 '20 19:08 kurtisc

Would you mind rebasing again, so I can see if the tests pass? I'll submit after.

ianki avatar Nov 09 '20 20:11 ianki

I am really interested in this

smartlitchi avatar Nov 13 '20 10:11 smartlitchi

I haven’t been able to build anki from scratch to import pyvi (I think because my hardware is a little old). Is there any other way I can get vietnamese parsing to work with morphman?

sedosido avatar Aug 15 '21 20:08 sedosido