g2pm
g2pm copied to clipboard
Can not reproduce the result.
I want to compare the performance of several g2p systems, so I download the CPP dataset, and try to reproduce the result showed in this repo. But I got much worse acc.
For g2pM v0.1.2.5,I got 92.9% for train set, 92.1% for dev set, and 91.6% for test set. Even ignore the tone information, the accs are: 96.6%, 96.1% 96.0% for train, dev and test set.
For pypinyin v0.36.0, I got 79.2%, 78.7%, 79.1% with tone, and 89.4%, 89.1%, 89.3% without tone.
To be more clear:
- The full sentence was fed to each system, to got the pinyin result.
- Then extract the predict as
re.findall(r'▁ ([a-z0-9:]+) ▁', pinyin)[0]
. - Finally, the acc was calculated as
np.array([i == j for i, j in zip(pred, gt)])
.
I'd like to know how do you get the acc value?
Attachment is the prediction for test set.
If any mistake in the computation, please point it out. Thanks,