spacy-course icon indicating copy to clipboard operation
spacy-course copied to clipboard

Wrong word segmentation in chinese course

Open placebokkk opened this issue 2 years ago • 1 comments

The sentence is just splited by character.

# 导入spacy并创建中文nlp对象
import spacy

nlp = spacy.blank("zh")

# 处理文本
doc = nlp("我喜欢老虎和狮子。")

# 遍历打印doc中的内容
for i, token in enumerate(doc):
    print(i, token.text)

# 截取Doc中"老虎"的部分
laohu = doc[2:3]
print(laohu.text)

# 截取Doc中"老虎和狮子"的部分(不包括"。")
laohu_he_shizi = doc[2:5]
print(laohu_he_shizi.text)

Output

0 我
1 喜
2 欢
3 老
4 虎
5 和
6 狮
7 子
8 。
欢
欢老虎

placebokkk avatar Mar 07 '22 07:03 placebokkk

Thanks for the note, we should update this to use jieba, I suspect.

adrianeboyd avatar Mar 07 '22 08:03 adrianeboyd