python-pinyin icon indicating copy to clipboard operation
python-pinyin copied to clipboard

是否支持训练功能?

Open onsunsl opened this issue 7 years ago • 4 comments

In [12]: pinyin("中心") Out[12]: [['zhōng'], ['xīn']]

In [13]: pinyin("重心") Out[13]: [['zhòng'], ['xīn']]

In [14]: pinyin("情调来调整风格") Out[14]: [['qíng'], ['diào'], ['lái'], ['diào'], ['zhěng'], ['fēng'], ['gé']]

In [15]: pinyin("调整风格") Out[15]: [['diào'], ['zhěng'], ['fēng'], ['gé']]

In [16]: pinyin("调整风格") Out[16]: [['diào'], ['zhěng'], ['fēng'], ['gé']]

In [17]: pinyin("调整") Out[17]: [['tiáo'], ['zhěng']]

In [18]: pinyin("调薪") Out[18]: [['diào'], ['xīn']]

分词了之后识别还是有问题 是否支持训练功能来纠正?

onsunsl avatar Feb 27 '17 06:02 onsunsl

@onsunsl 目前不支持。关于训练功能你有什么建议吗?

mozillazg avatar Feb 27 '17 14:02 mozillazg

我也没有,可以先做个多音字词组库

onsunsl avatar Mar 07 '17 13:03 onsunsl

@onsunsl 目前的办法是通过自定义词组库来解决这个问题:

>>> pinyin("中心")
[['zhōng'], ['xīn']]
>>> pinyin("重心")
[['zhòng'], ['xīn']]
>>> pinyin("情调来调整风格")
[['qíng'], ['diào'], ['lái'], ['tiáo'], ['zhěng'], ['fēng'], ['gé']]
>>> pinyin("调整风格")
[['tiáo'], ['zhěng'], ['fēng'], ['gé']]
>>> pinyin("调整")
[['tiáo'], ['zhěng']]
>>> pinyin("调薪")
[['diào'], ['xīn']]
>>> load_phrases_dict({'调薪': [['tiáo'], ['xīn']]})
>>> pinyin("调薪")
[['tiáo'], ['xīn']]
>>>

欢迎帮忙一起完善多音字词组库:https://github.com/mozillazg/phrase-pinyin-data

mozillazg avatar Mar 07 '17 14:03 mozillazg

@onsunsl 请问python-pinyin如何批量处理?

运行环境 操作系统:Windows10 Python 版本:python-3.4.3 pypinyin 版本:v0.33.0

我有一个文本文件b.txt,utf-8格式,文件里面有内容: 这个 进行 因为 还是 时候 看到 …… 想把转换成汉语拼音,该如何操作?

能批处理、拖叠文件等一步到位吗? 指教一下吧!谢!

zgdlime avatar Sep 01 '18 12:09 zgdlime

最近基于 g2pW 这个项目封装了一个使用机器学习技术支持模型训练的版本,感兴趣的话,可以试一下:https://github.com/mozillazg/pypinyin-g2pW

mozillazg avatar Aug 21 '22 12:08 mozillazg