icefall
icefall copied to clipboard
The problems in streaming decode are in pruned_ transducer_ stateless5
I used my own data to train a streaming model. The recognition effect is poor when decoding.
There are two obvious problems, one is to delete words at the end, and the other is to insert multiple words in the middle.
training:
decoding:
result:
What should I do to reduce these mistakes!I look forward to your reply. Thank you!!!
one is to delete words at the end,
Could you add some tail paddings to your utterances and retry?
the other is to insert multiple words in the middle
This problem is much harder to fix in your case since the ground truth contains 3 contiguous 零
.
Does your training data contain data patterns like this?
should be HYP not HYF. Yes, this is the kind of case where E2E models have trouble.
the other is to insert multiple words in the middle
This problem is much harder to fix in your case since the ground truth contains 3 contiguous
零
. Does your training data contain data patterns like this?
When I set avg to 1, the insertion error decreases a lot
the other is to insert multiple words in the middle
This problem is much harder to fix in your case since the ground truth contains 3 contiguous
零
. Does your training data contain data patterns like this?When I set avg to 1, the insertion error decreases a lot
What is your original setting?
one is to delete words at the end,
Could you add some tail paddings to your utterances and retry?
I added 1s silence after the testset, and almost didn't delete at the end of the sentence.
Another question is, what is the possible reason for many deletions in the middle of sentences?
What is your original setting?
The above reply contains the original configuration for decoding. It was written incorrectly. The avg used to be 2, but now it is 1
one is to delete words at the end,
Could you add some tail paddings to your utterances and retry?
I added 1s silence after the testset, and almost didn't delete at the end of the sentence.
Another question is, what is the possible reason for many deletions in the middle of sentences?
Have you listened to the audios of theses two utterances? Do they look normal?