faster-whisper Transcription is creating duplicate sentences

Transcribe is returning text with repeating sentences

Running with tiny.en model on cpu and int8 compute type:

[Segment(id=1, seek=240, start=0.0, end=2.4, text=' with times it. With times it. With times it. With times it. With times it. With 
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times 
it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With 
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times 
it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With 
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times 
it. With times it. With times it. With times it', tokens=[50363, 351, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340], temperature=1.0, 
avg_logprob=-0.12661736170450846, compression_ratio=27.032258064516128, no_speech_prob=0.22830641269683838, 
words=None)]

segments, _ = self.model.transcribe(file_path, 
                                            vad_filter=True, 
                                            vad_parameters=dict(min_silence_duration_ms=500))
print(segments)

This wasnt happening before upgrading to 1.0.0

Feb 25 '24 20:02 greerviau

Can you share an audio sample to reproduce the issue?

Feb 27 '24 13:02 Purfview

I am also seeing this issue. I used this code:

model = WhisperModel("small.en", compute_type="int8")
segments, info = model.transcribe("turnOnKitchenSink.wav")
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

and got this output:

[0.00s -> 29.20s]  Turn on kitchen sink. Turn on kitchen sink.

I get the same output if I don't specify compute_type. The audio sample is: turnOnKitchenSink.zip

Feb 27 '24 17:02 stu247

I cant point to a specific audio sample it was happening to, it was happening to any that I tried. I'm not sure if it was a problem on gpu, I was using cpu for my testing, but I reverted my code to 0.10.1 until there's a fix.

Feb 27 '24 17:02 greerviau

I see the same issue when i set a language. (medium and large-v3 model tested).

Happens on medium.en just as on other models if a language is set.

If i set language to autodetect, its fine. older version is fine too.

Feb 27 '24 23:02 Sharrnah

As update: It was introduced with this commit: https://github.com/SYSTRAN/faster-whisper/commit/092067208b45e5eb935bfe8b0ac42b8341da9434

When i revert this one back, its fine again.

Feb 28 '24 00:02 Sharrnah

@Sharrnah , hello. Can you try again with this fix ?

Feb 28 '24 01:02 trungkienbkhn

I am also seeing this issue.

[0.00s -> 29.20s]  Turn on kitchen sink. Turn on kitchen sink.
I get the same output if I don't specify compute_type. The audio sample is: turnOnKitchenSink.zip

Looks good with PR #705:

[00:00.000 --> 00:02.220]  Turn on kitchen sink.

Feb 28 '24 06:02 Purfview

@Sharrnah , hello. Can you try again with this fix ?

Thanks. looks fine with the fix. :)

Feb 28 '24 11:02 Sharrnah

fully tested the fix, and works well.

Mar 01 '24 05:03 James-Shared-Studios

faster-whisper faster-whisper copied to clipboard

Transcription is creating duplicate sentences

faster-whisper
faster-whisper copied to clipboard