End of sequence token problem.

Open ngcheeyuan opened this issue 1 year ago • 1 comments

I was transcribing an audio file that was about 65 seconds long. However the model kept generating text until about 83s (based on time stamp).

Is this an issue with the 30s chunking and the last 5 seconds being padded to fill up the 30s?

Is there a way I can solve this (other than post processing cutting out text that exceeds the duration of the audio file) .

Thanks.

Mar 01 '24 11:03 ngcheeyuan

@ngcheeyuan , hello. Can you attach your audio and show the transcription log ?

Mar 05 '24 07:03 trungkienbkhn