icefall icon indicating copy to clipboard operation
icefall copied to clipboard

The problems in streaming decode are in pruned_ transducer_ stateless5

Open yangsuxia opened this issue 2 years ago • 8 comments

I used my own data to train a streaming model. The recognition effect is poor when decoding. There are two obvious problems, one is to delete words at the end, and the other is to insert multiple words in the middle. training: image decoding: image result: image

What should I do to reduce these mistakes!I look forward to your reply. Thank you!!!

yangsuxia avatar Oct 08 '22 10:10 yangsuxia

one is to delete words at the end,

Could you add some tail paddings to your utterances and retry?

csukuangfj avatar Oct 08 '22 10:10 csukuangfj

the other is to insert multiple words in the middle

This problem is much harder to fix in your case since the ground truth contains 3 contiguous . Does your training data contain data patterns like this?

csukuangfj avatar Oct 08 '22 10:10 csukuangfj

should be HYP not HYF. Yes, this is the kind of case where E2E models have trouble.

danpovey avatar Oct 08 '22 12:10 danpovey

the other is to insert multiple words in the middle

This problem is much harder to fix in your case since the ground truth contains 3 contiguous . Does your training data contain data patterns like this?

When I set avg to 1, the insertion error decreases a lot

yangsuxia avatar Oct 10 '22 07:10 yangsuxia

the other is to insert multiple words in the middle

This problem is much harder to fix in your case since the ground truth contains 3 contiguous . Does your training data contain data patterns like this?

When I set avg to 1, the insertion error decreases a lot

What is your original setting?

csukuangfj avatar Oct 10 '22 07:10 csukuangfj

one is to delete words at the end,

Could you add some tail paddings to your utterances and retry?

I added 1s silence after the testset, and almost didn't delete at the end of the sentence.

Another question is, what is the possible reason for many deletions in the middle of sentences? image

yangsuxia avatar Oct 10 '22 07:10 yangsuxia

What is your original setting?

The above reply contains the original configuration for decoding. It was written incorrectly. The avg used to be 2, but now it is 1

yangsuxia avatar Oct 10 '22 07:10 yangsuxia

one is to delete words at the end,

Could you add some tail paddings to your utterances and retry?

I added 1s silence after the testset, and almost didn't delete at the end of the sentence.

Another question is, what is the possible reason for many deletions in the middle of sentences?

image

Have you listened to the audios of theses two utterances? Do they look normal?

csukuangfj avatar Oct 10 '22 08:10 csukuangfj