cantoseg icon indicating copy to clipboard operation
cantoseg copied to clipboard

Cantonese segmentation tool 粵語分詞工具

cantoseg

Cantonese segmentation tool 粵語分詞工具

Install

$ pip install cantoseg

Usage

>>> import cantoseg
>>> cantoseg.cut('香港喺舊石器時代就有人住')
['香港', '喺', '舊石器時代', '就', '有人', '住']

A generator version is also available: cantoseg.lcut.

Design

See article Cantonese Segmentation and Part-Of-Speech Tagging (in Chinese).