kaldi icon indicating copy to clipboard operation
kaldi copied to clipboard

High scoring for not existing phone in gop_speechocean762

Open thangdc94 opened this issue 4 years ago • 9 comments

I has used old version of GOP provided by @jimbozhang and I'm using new version of GOP (gop_speechocean762)

When I test with 1 audio but different text, model return high score for not existing phone in audio.

E.g: Text is "kick", Audio contains only 1 word "change"

Result:

K: 1.98
IH: 1.27
K: 1.05

I think the problem is the dataset is not balance. 0: 1339 1: 1828 2: 44079

thangdc94 avatar Feb 09 '21 04:02 thangdc94

I think you are right.

In the speechocean762 dataset, 50% of speakers have good pronunciation, 25% of speakers have so-so pronunciation, and the rest 25% have poor pronunciation. However, ever for the poor English speakers, most phones are pronounced correctly. So the phone-level scores are quite unbalanced.

Feel free to make PR once you trained a better model to overcome the unbalance problem. It would be very beneficial.

jimbozhang avatar Feb 09 '21 12:02 jimbozhang

I just balanced the training data with a small trick, and the performance looks better. The new version is on the following branch: https://github.com/jimbozhang/kaldi/tree/jzhang.gop.balanced_traindata

Could @thangdc94 please check it?

jimbozhang avatar Feb 13 '21 15:02 jimbozhang

@jimbozhang Sure. Let me see.

thangdc94 avatar Feb 13 '21 15:02 thangdc94

@jimbozhang I retrained the model using your new scripts. The result seems better but I think it's still far from satisfaction.

The result when I used above test:

K: 1.1
IH: 1.54
K: 0.71

The result seems better, but when I change to other test cases, results are not so good. I'm using audio from cambridge dictionary and random text as input.

I think alignment from acoustic model can be problems. I read IMPROVING PRONUNCIATION ASSESSMENT VIA ORDINAL REGRESSION WITH ANCHORED REFERENCE SAMPLES.

They used cGOP to indicate how pronunciation unit is confused with other phonemes. Input to calculate cGOP is alignment and posterior.

thangdc94 avatar Feb 13 '21 16:02 thangdc94

Thanks very much for your testing and suggestion.

As future works, word-level and sentence-level scoring are planned to be added to this recipe, but for now we do not plan to implement new methods such as "cGOP" you mentioned, even those may perform better. After all, this recipe is just for illustrating how to use the speechocean dataset, and we do not wish it too complicated.

jimbozhang avatar Feb 13 '21 16:02 jimbozhang

@thangdc94 Hi I want to know that did this problem solve? ( I mean the problem: high score for phones that not exist) if yes, what did you do for solving it?

omidaghdaei avatar Jul 15 '21 06:07 omidaghdaei

@omidaghdaei I haven't solved the problem yet. speechocean_gop hasn't returned reasonable results for phone score. I think this GOP method is just good to demo but it's still far from satisfying. I will list some reasons I have considered:

  • The lack of negative phone data.
  • Problem with force alignment output with mismatch voice and transcript (Maybe we need more data to train ASR).
  • Maybe NN-GOP with regression model is not good enough to solve this problem.

I'm currently using Azure Pronunciation assessment API in production. It's really good.

You can have a look at IMPROVING PRONUNCIATION ASSESSMENT VIA ORDINAL REGRESSION WITH ANCHORED REFERENCE SAMPLES. This paper is from Microsoft team. I think they implemented it in their Azure API.

I have tried to send audio to Azure API to get result and use it as input data to train regression model. I follow the above paper, it's seems better than speechocean_gop but the result is still not good enough.

thangdc94 avatar Jul 15 '21 09:07 thangdc94

@thangdc94 Thank you very much wish you all the best can I have your mail address to ask you some questions related to this topic? Although I try not to bother you by asking many questions and I try to solve any problem by myself. I'm working on this topic and after I get your message, I want to test Azure Pronunciation Assessment.

omidaghdaei avatar Jul 15 '21 16:07 omidaghdaei

@omidaghdaei My email is [email protected]

thangdc94 avatar Jul 16 '21 06:07 thangdc94