whisper-diarization Weird words repetitions on zh

I've tested this project with English(default model) and it worked as expected, but when I run the same audio with Large model, I encountered RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size error. But if I switched to another audio in Chinese(using large-V2), it goes without error, but with so much weird words repetitions in the output. Translating the Chinese below, it's akin to:

Speaker 1: Please don't don't hesitate to like ke and subscribe scribe

Speaker 1: 请不不吝点点赞赞订阅阅

Speaker 0: 转发发打赏赏支支持持明明镜镜与与点点点点栏栏目目不去去锵锵锵三人人行行了了其实实我我特特想想来来几次次编编导导约约我我都都是是时时间间冲冲突突嘛嘛冲突突呢呢那我我就就是是很很世世俗俗的的认认为为利利益益最最大

Update: Tested on Japanese and get the same result as well. Tested on French and works well just like English.

Sep 14 '23 05:09 terryops

I've been getting the same error with different languages (including English actually). Any help is much appreciated

Sep 17 '23 22:09 Toby1091

same error :(

Sep 25 '23 09:09 XinyuZhou2000

I got the same error when using this https://github.com/ggerganov/whisper.cpp Do you get the same results using the official Whisper from OpenAI? I decided to revert back to that because got too inaccurate results using anything other than that.

Sep 25 '23 12:09 AlbinGyllander

When Ive played with whisper large model, different languages works as expected. When I use WhisperX or other implementation os speed ones - I have issues with languages different than english.

Dec 04 '23 20:12 PiotrEsse