icefall The problems in streaming decode are in pruned_ transducer

The problems in streaming decode are in pruned_ transducer_ stateless5

Open yangsuxia opened this issue 3 years ago • 8 comments

I used my own data to train a streaming model. The recognition effect is poor when decoding. There are two obvious problems, one is to delete words at the end, and the other is to insert multiple words in the middle. training： decoding： result：

What should I do to reduce these mistakes！I look forward to your reply. Thank you！！！

Oct 08 '22 10:10 yangsuxia

one is to delete words at the end,

Could you add some tail paddings to your utterances and retry?

Oct 08 '22 10:10 csukuangfj

the other is to insert multiple words in the middle

This problem is much harder to fix in your case since the ground truth contains 3 contiguous 零. Does your training data contain data patterns like this?

Oct 08 '22 10:10 csukuangfj

should be HYP not HYF. Yes, this is the kind of case where E2E models have trouble.

Oct 08 '22 12:10 danpovey

the other is to insert multiple words in the middle

This problem is much harder to fix in your case since the ground truth contains 3 contiguous 零. Does your training data contain data patterns like this?

When I set avg to 1, the insertion error decreases a lot

Oct 10 '22 07:10 yangsuxia

the other is to insert multiple words in the middle

This problem is much harder to fix in your case since the ground truth contains 3 contiguous 零. Does your training data contain data patterns like this?

When I set avg to 1, the insertion error decreases a lot

What is your original setting?

Oct 10 '22 07:10 csukuangfj

one is to delete words at the end,

Could you add some tail paddings to your utterances and retry?

I added 1s silence after the testset, and almost didn't delete at the end of the sentence.

Another question is, what is the possible reason for many deletions in the middle of sentences?

Oct 10 '22 07:10 yangsuxia

What is your original setting?

The above reply contains the original configuration for decoding. It was written incorrectly. The avg used to be 2, but now it is 1

Oct 10 '22 07:10 yangsuxia

one is to delete words at the end,

Could you add some tail paddings to your utterances and retry?

I added 1s silence after the testset, and almost didn't delete at the end of the sentence.

Another question is, what is the possible reason for many deletions in the middle of sentences?

Have you listened to the audios of theses two utterances? Do they look normal?

Oct 10 '22 08:10 csukuangfj

icefall icefall copied to clipboard

The problems in streaming decode are in pruned_ transducer_ stateless5

icefall
icefall copied to clipboard