Whisper plz support the latest Large V3 model

now it can load the Large V3 model but extremely slow when put it to work and actually it doesnt work at all QQ截图20231107223629

Nov 07 '23 14:11 MrFutureV

这很正常，LARGE 模型会比 MEDIUM 模型耗时多得多，我也这样

Nov 07 '23 14:11 Jiang10086

What @Jiang10086 says translates more or less to "it is normal that large model is slow" if i get that correctly? Well, in this case we face the V3 Model and this is currently not supported in the Const-Me Whipser version. It needs at least some minor changes (https://github.com/ggerganov/whisper.cpp/pull/1444/commits/185d3fd6d9c31a5cbe5f62df332eac03e56231ee) to function correctly.

However, i just tried to add the support in Const-Me version but the output was so bad that i wondered if i did something wrong. Then i took the Whispercpp original program in the brand new Version with V3 support and what i saw was disturbing. The translation itself was okayish but it hallucinated forever after a few sentences where V2 did not do that even with the early inferences.

My conclusion is that V3 ggml is either not ready yet or it requires a pretty different strategy to decode than we do currently. So if it comes, you will have to wait some months for it i fear.

Nov 07 '23 20:11 emcodem

Still for anyone interested to test it out, here my custom version that ONLY WORKS with the V3 model, not with any other. This is WhisperDesktop for GUI users and main.exe for commandline users: whisperConstMe_V3_test.zip

Download the "ggml-large.bin" Model from here (they renamed the current Large to Large-V2 and Large is now the V3.

If you use Whisperdesktop, you need to change the model path but whisperdesktop seems to always load the first model you loaded and not allow me to select a different one. You need to delete this registry key in order to load a different model:

HKEY_CURRENT_USER\Software\const.me\WhisperDesktop

You can do this by running regedit.exe, navigate to the folder and delete the whole WhsiperDesktop folder.

For developers, sorry but i dont have a patch yet, it does not make sense to upload one because i was not starting my changes at the current master version but my own local playaround branch. Basically what i did was exactly mentioned in the linked change in whispercpp.

Biggest problem was in this section. In whispercpp they do

vocab.token_translate += dt;
vocab.token_transcribe += dt;

But i needed to exclude these 2 tokens from raising by "dt" and raise them only by one instead:


	const int dt = num_languages() - 98;
	token_eot++;
	token_sot++;
	token_translate ++;
	token_transcribe ++;

	if( is_multilingual() )
	{

		token_prev += dt;
		token_solm += dt;
		token_not += dt;
		token_beg += dt;
	};

As i didnt fully understand what i do here there is like 99% guarantee that what i did is not completely clean.

@steipal you are welcome ;-) But as i said before, this model tends so much to repeat stuff that i cannot really take it serious yet. Not worth any more efforts i fear.

Nov 07 '23 23:11 emcodem

这很正常，LARGE 模型会比 MEDIUM 模型耗时多得多，我也这样

V3最新的模型实测是只能加载，并不能正常用，还是得等作者更新

Nov 08 '23 05:11 MrFutureV

What @Jiang10086 says translates more or less to "it is normal that large model is slow" if i get that correctly? Well, in this case we face the V3 Model and this is currently not supported in the Const-Me Whipser version. It needs at least some minor changes (ggerganov/whisper.cpp@185d3fd) to function correctly.

However, i just tried to add the support in Const-Me version but the output was so bad that i wondered if i did something wrong. Then i took the Whispercpp original program in the brand new Version with V3 support and what i saw was disturbing. The translation itself was okayish but it hallucinated forever after a few sentences where V2 did not do that even with the early inferences.

My conclusion is that V3 ggml is either not ready yet or it requires a pretty different strategy to decode than we do currently. So if it comes, you will have to wait some months for it i fear.

u may check another project that now supports large v3 model,which is https://colab.research.google.com/github/Ayanaminn/N46Whisper/blob/main/N46Whisper.ipynb#scrollTo=Fjm3tYISAk9P that might give u some inspirations on how to work with large v3,but anyway i just know this but have no idea how that works in detail technically

Nov 08 '23 05:11 MrFutureV

@MrFutureV the N46 Jupyter Notebook you linked is cool but after all it just uses the original openai whisper python project so this is not directly useful for us in this project unfortunately. However the tool i shared in my previous answer seems to work as expected with V3.

Nov 08 '23 08:11 emcodem

I can add that the results in my one-minute test of Hebrew (with the patched WhisperDesktop GUI) are vastly inferior to V2. A test here with the same file gives much better results: https://huggingface.co/openai/whisper-large-v3 So I assume the issue is in the conversion somewhere (or something).

Nov 08 '23 20:11 darnn

@darnn i tested a little more and also come to the conclusion that my uploaded versions with V3 model seem to generate much worse output as they should. I spent some more time but was not yet able to find out the cause.

@Const-me kindly request some help here, maybe you find the time to look at proper V3 support. Not sure if what i did is enough to reflect the changes seen in https://github.com/ggerganov/whisper.cpp/commit/185d3fd6d9c31a5cbe5f62df332eac03e56231ee, especially the N_MEL part...

Here is a git diff of a clean version containing my changes relative to the master branch:


diff --git a/Whisper/Whisper/Vocabulary.cpp b/Whisper/Whisper/Vocabulary.cpp
index 8588dd7..249b133 100644
--- a/Whisper/Whisper/Vocabulary.cpp
+++ b/Whisper/Whisper/Vocabulary.cpp
@@ -107,14 +107,19 @@ HRESULT Vocabulary::load( ComLight::iReadStream* stm, int lengthInHeader )
 
 	n_vocab = lengthInHeader;
 
-	if( is_multilingual() )
+	const int dt = num_languages() - 98;
+
+	token_eot++;
+	token_sot++;
+	token_translate++;
+	token_transcribe++;
+
+	if (is_multilingual())
 	{
-		token_eot++;
-		token_sot++;
-		token_prev++;
-		token_solm++;
-		token_not++;
-		token_beg++;
+		token_solm += dt;
+		token_prev += dt;
+		token_not += dt;
+		token_beg += dt;
 	};
 
 	if( countWords < lengthInHeader )
diff --git a/Whisper/Whisper/Vocabulary.h b/Whisper/Whisper/Vocabulary.h
index c4feffd..24330f1 100644
--- a/Whisper/Whisper/Vocabulary.h
+++ b/Whisper/Whisper/Vocabulary.h
@@ -32,12 +32,16 @@ namespace Whisper
 		id token_beg = 50363;
 
 		// available tasks
-		static const id token_translate = 50358;
-		static const id token_transcribe = 50359;
+		id token_translate = 50358;
+		id token_transcribe = 50359;
 
 		bool is_multilingual() const
 		{
-			return n_vocab == 51865;
+			return n_vocab >= 51865;
+		}
+
+		int num_languages() const {
+			return n_vocab - 51765 - (is_multilingual() ? 1 : 0);
 		}
 
 		const char* string( int id ) const
diff --git a/Whisper/Whisper/audioConstants.h b/Whisper/Whisper/audioConstants.h
index 232ab22..2dc75aa 100644
--- a/Whisper/Whisper/audioConstants.h
+++ b/Whisper/Whisper/audioConstants.h
@@ -10,5 +10,5 @@ namespace Whisper
 	// WHISPER_HOP_LENGTH, 10 milliseconds
 	constexpr uint32_t FFT_STEP = 160;
 	// WHISPER_N_MEL
-	constexpr uint32_t N_MEL = 80;
+	constexpr uint32_t N_MEL = 128;
 }

Nov 09 '23 01:11 emcodem

@MrFutureV the N46 Jupyter Notebook you linked is cool but after all it just uses the original openai whisper python project so this is not directly useful for us in this project unfortunately. However the tool i shared in my previous answer seems to work as expected with V3.

yeah i tried the tool u mentioned,it did work,so i guess we can only do this temporarily before the original author updates the official version.

Nov 09 '23 07:11 MrFutureV

Still for anyone interested to test it out, here my custom version that ONLY WORKS with the V3 model, not with any other. This is WhisperDesktop for GUI users and main.exe for commandline users: whisperConstMe_V3_test.zip

Download the "ggml-large.bin" Model from here (they renamed the current Large to Large-V2 and Large is now the V3.

If you use Whisperdesktop, you need to change the model path but whisperdesktop seems to always load the first model you loaded and not allow me to select a different one. You need to delete this registry key in order to load a different model:

HKEY_CURRENT_USER\Software\const.me\WhisperDesktop

You can do this by running regedit.exe, navigate to the folder and delete the whole WhsiperDesktop folder.

For developers, sorry but i dont have a patch yet, it does not make sense to upload one because i was not starting my changes at the current master version but my own local playaround branch. Basically what i did was exactly mentioned in the linked change in whispercpp.

Biggest problem was in this section. In whispercpp they do
vocab.token_translate += dt;
vocab.token_transcribe += dt;
But i needed to exclude these 2 tokens from raising by "dt" and raise them only by one instead:
	const int dt = num_languages() - 98;
	token_eot++;
	token_sot++;
	token_translate ++;
	token_transcribe ++;

	if( is_multilingual() )
	{

		token_prev += dt;
		token_solm += dt;
		token_not += dt;
		token_beg += dt;
	};
As i didnt fully understand what i do here there is like 99% guarantee that what i did is not completely clean.

@steipal you are welcome ;-) But as i said before, this model tends so much to repeat stuff that i cannot really take it serious yet. Not worth any more efforts i fear.

i got some repetitive production of subtitles somewhere,idk why,thats the problem so far that i found

Nov 14 '23 10:11 MrFutureV

Good news related to this topic!

It seems that there has been an update in the API . One of the improvements is that it officially supports version 3 of the model. Now all that is needed is for this project to be updated in turn. :-)

JC

Nov 17 '23 12:11 jcarlos-ciber

@emcodem Any chance you cobbled together a build that works properly with V3 in the end?

Feb 28 '24 15:02 darnn

对于任何有兴趣测试它的人来说，这里是我的自定义版本，它**仅适用于 V3 模型，**不适用于任何其他模型。对于 GUI 用户，这是 WhisperDesktop；对于命令行用户，这是 main.exe： whisperConstMe_V3_test.zip

从这里下载“ ggml-large.bin ”模型（他们将当前的 Large 重命名为 Large-V2，Large 现在是 V3。

如果您使用 Whisperdesktop，则需要更改模型路径，但 Whisperdesktop 似乎总是加载您加载的第一个模型，并且不允许我选择其他模型。您需要删除此注册表项才能加载不同的模型：

HKEY_CURRENT_USER\Software\const.me\WhisperDesktop

您可以通过运行 regedit.exe、导航到该文件夹并删除整个 WhsiperDesktop 文件夹来执行此操作。

对于开发人员来说， 抱歉，我还没有补丁，上传补丁没有意义，因为我不是在当前的主版本上开始更改，而是在我自己的本地游戏分支上开始更改。基本上我所做的正是在whispercpp 的链接更改中提到的。

最大的问题就在这一部分。在whispercpp中他们这样做
vocab.token_translate += dt;
vocab.token_transcribe += dt;
但我需要将这 2 个代币排除在“dt”之外，而只将它们提高 1 个：
	const int dt = num_languages() - 98;
	token_eot++;
	token_sot++;
	token_translate ++;
	token_transcribe ++;

	if( is_multilingual() )
	{

		token_prev += dt;
		token_solm += dt;
		token_not += dt;
		token_beg += dt;
	};
因为我没有完全理解我在这里所做的事情，所以 99% 的保证我所做的并不完全干净。

@steipal不客气;-) 但正如我之前所说，这个模型往往会重复很多东西，我还不能真正认真对待它。我担心不值得再付出任何努力。

thanks

Mar 02 '24 04:03 jellzone

@emcodem 的方法很有用，使用whisperConstMe_V3_test.zip能解决这个问题（runFullImpl: failed to generate timestamp token - skipping one second）

Mar 07 '24 02:03 clinging-al

@emcodem 的方法很有用，使用whisperConstMe_V3_test.zip能解决这个问题（runFullImpl: failed to generate timestamp token - skipping one second）

然而还是没办法避免V3这个模型目前的输出重复问题，还有转写准确率也不如之前的那么高，还是看openai以后的更新吧

Mar 07 '24 05:03 MrFutureV

路径名称竟然有“兼职”两个字，还能赚钱哦

May 10 '24 14:05 vindia9

it work!

May 15 '24 14:05 reatang

@reatang What do you mean? Did you manage to get WhisperDesktop working with large-v3? If so, I'd love to know how.

May 15 '24 14:05 darnn

@reatang What do you mean? Did you manage to get WhisperDesktop working with large-v3? If so, I'd love to know how.

I tried large-v3 model with latest software, unfortunately still so slow, 1 minute audio used about 3 minutes on my rtx2060

Jun 04 '24 15:06 DUYA112233

@Const-me Please fix then release soon! Thanks very much!

Jul 26 '24 03:07 JeNetwork

Still for anyone interested to test it out, here my custom version that ONLY WORKS with the V3 model, not with any other. This is WhisperDesktop for GUI users and main.exe for commandline users: whisperConstMe_V3_test.zip

Seems to work for me! With the newest release, using large-v3 I would get a constant runFullImpl: failed to generate timestamp token - skipping one second error, but that seems to have gone away with this one, and there's even a nice function that resets the token every time a line repeats! Thank you for your work.

Aug 22 '24 01:08 Jesys32

@reatang What do you mean? Did you manage to get WhisperDesktop working with large-v3? If so, I'd love to know how.

use whisperConstMe_V3_test.zip

Aug 22 '24 03:08 reatang

@reatang @Jesys32 And you don't find that the results are substantially worse than with the regular build and large-v2? Because that's what it was like for me.

Aug 22 '24 08:08 darnn

@reatang @Jesys32 And you don't find that the results are substantially worse than with the regular build and large-v2? Because that's what it was like for me.

I've tried the test build with a ggml-converted whisper v3, and for a test file that consists of the audio part of one of my streams, with background music, Whisper v2 only "heard" and transcribed the music where v3 give me the result of my voice instead (not completely perfect, it was somewhat technical, and in French, and I know anything else than English is always worse)

Oct 05 '24 21:10 seboss666

Do you have a link to the converted large-v3?

Oct 05 '24 21:10 darnn

Do you have a link to the converted large-v3?

Sure :) https://huggingface.co/leafspark/whisper-large-v3-ggml/

Oct 11 '24 19:10 seboss666

Thank you! Unfortunately the results are the same as they were when I tried it last time (I thought maybe the issue was the model I downloaded) - at least in Hebrew, there are many more mistakes, missed letters, letters that switch places in a word and so on. (This doesn't happen when using large-v3 with vanilla Whisper or WhisperX, of course.)

Oct 11 '24 20:10 darnn

Whisper Whisper copied to clipboard

plz support the latest Large V3 model

Whisper
Whisper copied to clipboard