faster-whisper
faster-whisper copied to clipboard
Transcription is creating duplicate sentences
Transcribe is returning text with repeating sentences
Running with tiny.en model on cpu and int8 compute type:
[Segment(id=1, seek=240, start=0.0, end=2.4, text=' with times it. With times it. With times it. With times it. With times it. With
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times
it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times
it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times
it. With times it. With times it. With times it', tokens=[50363, 351, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080,
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080,
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080,
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080,
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080,
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080,
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080,
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080,
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080,
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340], temperature=1.0,
avg_logprob=-0.12661736170450846, compression_ratio=27.032258064516128, no_speech_prob=0.22830641269683838,
words=None)]
segments, _ = self.model.transcribe(file_path,
vad_filter=True,
vad_parameters=dict(min_silence_duration_ms=500))
print(segments)
This wasnt happening before upgrading to 1.0.0
Can you share an audio sample to reproduce the issue?
I am also seeing this issue. I used this code:
model = WhisperModel("small.en", compute_type="int8")
segments, info = model.transcribe("turnOnKitchenSink.wav")
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
and got this output:
[0.00s -> 29.20s] Turn on kitchen sink. Turn on kitchen sink.
I get the same output if I don't specify compute_type. The audio sample is: turnOnKitchenSink.zip
I cant point to a specific audio sample it was happening to, it was happening to any that I tried. I'm not sure if it was a problem on gpu, I was using cpu for my testing, but I reverted my code to 0.10.1 until there's a fix.
I see the same issue when i set a language. (medium and large-v3 model tested).
Happens on medium.en just as on other models if a language is set.
If i set language to autodetect, its fine. older version is fine too.
As update: It was introduced with this commit: https://github.com/SYSTRAN/faster-whisper/commit/092067208b45e5eb935bfe8b0ac42b8341da9434
When i revert this one back, its fine again.
@Sharrnah , hello. Can you try again with this fix ?
I am also seeing this issue.
[0.00s -> 29.20s] Turn on kitchen sink. Turn on kitchen sink.
I get the same output if I don't specify compute_type. The audio sample is: turnOnKitchenSink.zip
Looks good with PR #705:
[00:00.000 --> 00:02.220] Turn on kitchen sink.
fully tested the fix, and works well.