g2pm icon indicating copy to clipboard operation
g2pm copied to clipboard

the project can not beat pypinyin!

Open shawnthu opened this issue 4 years ago • 3 comments

shawnthu avatar Oct 22 '20 10:10 shawnthu

please describe your experimental setup.

In our experimental setup, our model performs better than the other baselines.

seanie12 avatar Jan 12 '21 10:01 seanie12

The pretrained g2pM model is even worse then pypinyin. The count of poly char is too much while the training corpus is too small. But even we had extend the corpus, the result is not so good.

JohnHerry avatar Aug 26 '21 02:08 JohnHerry

`

p1 = lazy_pinyin(sentence, style=Style.TONE3, neutral_tone_with_five=True)
print('pypinyin lazy')
print(p1)

model = G2pM()
p2 = model(sentence, tone=True, char_split=False)
print('g2m')
print(p2)`

Here is what I found where it may perform worth than pypinyin... ` 然而,他红了20年以后,他在长沙长大,也在长沙退休。

pypinyin lazy

['ran2', 'er2', ',', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', ',', 'ta1', 'zai4', 'chang2', 'sha1', 'zhang3', 'da4', ',', 'ye3', 'zai4', 'chang2', 'sha1', 'tui4', 'xiu1', '。']

g2m

['ran2', 'er2', ',', 'ta1', 'hong2', 'le5', '20', 'nian2', 'yi3', 'hou4', ',', 'ta1', 'zai4', 'chang2', 'sha1', 'chang2', 'da4', ',', 'ye3', 'zai4', 'chang2', 'sha1', 'tui4', 'xiu1', '。']`

dyustc avatar Apr 06 '23 08:04 dyustc