faster-whisper icon indicating copy to clipboard operation
faster-whisper copied to clipboard

Transcription is creating duplicate sentences

Open greerviau opened this issue 1 year ago • 9 comments

Transcribe is returning text with repeating sentences

Running with tiny.en model on cpu and int8 compute type:

[Segment(id=1, seek=240, start=0.0, end=2.4, text=' with times it. With times it. With times it. With times it. With times it. With 
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times 
it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With 
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times 
it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With 
times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times it. With times 
it. With times it. With times it. With times it', tokens=[50363, 351, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 
1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340, 13, 2080, 1661, 340], temperature=1.0, 
avg_logprob=-0.12661736170450846, compression_ratio=27.032258064516128, no_speech_prob=0.22830641269683838, 
words=None)]
segments, _ = self.model.transcribe(file_path, 
                                            vad_filter=True, 
                                            vad_parameters=dict(min_silence_duration_ms=500))
print(segments)

This wasnt happening before upgrading to 1.0.0

greerviau avatar Feb 25 '24 20:02 greerviau

Can you share an audio sample to reproduce the issue?

Purfview avatar Feb 27 '24 13:02 Purfview

I am also seeing this issue. I used this code:

model = WhisperModel("small.en", compute_type="int8")
segments, info = model.transcribe("turnOnKitchenSink.wav")
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

and got this output:

[0.00s -> 29.20s]  Turn on kitchen sink. Turn on kitchen sink.

I get the same output if I don't specify compute_type. The audio sample is: turnOnKitchenSink.zip

stu247 avatar Feb 27 '24 17:02 stu247

I cant point to a specific audio sample it was happening to, it was happening to any that I tried. I'm not sure if it was a problem on gpu, I was using cpu for my testing, but I reverted my code to 0.10.1 until there's a fix.

greerviau avatar Feb 27 '24 17:02 greerviau

I see the same issue when i set a language. (medium and large-v3 model tested).

Happens on medium.en just as on other models if a language is set.

If i set language to autodetect, its fine. older version is fine too.

Sharrnah avatar Feb 27 '24 23:02 Sharrnah

As update: It was introduced with this commit: https://github.com/SYSTRAN/faster-whisper/commit/092067208b45e5eb935bfe8b0ac42b8341da9434

When i revert this one back, its fine again.

Sharrnah avatar Feb 28 '24 00:02 Sharrnah

@Sharrnah , hello. Can you try again with this fix ?

trungkienbkhn avatar Feb 28 '24 01:02 trungkienbkhn

I am also seeing this issue.

[0.00s -> 29.20s]  Turn on kitchen sink. Turn on kitchen sink.

I get the same output if I don't specify compute_type. The audio sample is: turnOnKitchenSink.zip

Looks good with PR #705:

[00:00.000 --> 00:02.220]  Turn on kitchen sink.

Purfview avatar Feb 28 '24 06:02 Purfview

@Sharrnah , hello. Can you try again with this fix ?

Thanks. looks fine with the fix. :)

Sharrnah avatar Feb 28 '24 11:02 Sharrnah

fully tested the fix, and works well.

James-Shared-Studios avatar Mar 01 '24 05:03 James-Shared-Studios