Whisper Use Whisper from CMD

Hi, there's a way to use this soft directly from the command line?? Another cool feature from the cli version is to have the progress..

Sep 13 '23 21:09 amichelis

Yes, the download is called cli.zip and the program inside cli.zip is called main.exe https://github.com/Const-me/Whisper/releases/tag/1.12.0

You will see the progress on commandline because every translated line is printed to stdout.

Sep 14 '23 08:09 emcodem

@emcodem forgive me what's probably obvious to you, but should I assume the CLI version doesn't support streaming/capture transcribing like the GUI version (...desktop.zip)?

I'm not familiar with the way this all works, but I also don't see any commands in the associated files where other commands are listed that imply streaming/capture is supported.

Sep 15 '23 03:09 DontDeleteTim

@DontDeleteTim as there is no real manual here all we can do is to trial and error.

There is a commandline program part of this repository for streaming from microphone but it's meant to be an example for developers (as most of the stuff here) and you must build it yourself using the build instructions in the readme file. It's called MicrophoneCS.

Here is a recent build from me: MicrophoneCS.zip

Compared to main.exe, it has 2 additional important but they are not mentioned in -h (help) -c -ld

where -c is the capture device index and -ld is for listing the capture devices

Sep 15 '23 08:09 emcodem

Hi,

sometimes I have a huge list of files I need to translate and since the desktop version has no file list support, I want to use your cli version but I do get a problem: main -gpu "NVIDIA GeForce RTX 3060" -tr true -osrt true -l "ja" -m "models/ggml-large-v2.bin" -f "test.mp3"

Using GPU "NVIDIA GeForce RTX 3060", feature level 12.1, effective flags Wave32 | NoReshapedMatMul Loaded MEL filters, 62.8 kb RAM Loaded vocabulary, 51865 strings, 3037.1 kb RAM Loaded 1259 GPU tensors, 2950.66 MB VRAM Computed CPU base frequency: 3.49345 GHz Loaded model from "models/ggml-large-v2.bin" to VRAM Unable to decode audio file "true", MFCreateSourceReaderFromURL failed

Where is my error?

Sep 27 '23 16:09 sidoitsu48

@sidoitsu48 Your error message says Unable to decode audio file "true".

You have to remove all "true" from your command, e.g. just write -osrt but not -osrt true, same with -tr

Also, did you have a look at a gitub project called "whisperer"? It might do exactly what you want...

Sep 27 '23 16:09 emcodem

@emcodem

I appreciate your response! But, uh, silly question: where is the output text file for the build you shared here?

Run via batch file, "[path]\MicrophoneCS.exe" -m "[path]\models\ggml-medium.en.bin" -l en -mc 0 -c 3 -otxt %1,

but no output file to be found. The GUI model prints repeatedly to the text file. Am I missing a location check?

Oct 04 '23 22:10 DontDeleteTim

@DontDeleteTim i fear the -otxt option is not implemented, so the best you can do is to pipe the output to a file. In your example maybe like this: "[path]\MicrophoneCS.exe" -m "[path]\models\ggml-medium.en.bin" -l en -mc 0 -c 3 > %1

Where %1 is the output file. By the way, i did not check if th e-mc and -c options are actually implemented :( The approach should basically work for you because all errors and non related output is usually printed to stderr instead of stdout so you see it on commandline but you would not see it in the output text file.

Oct 04 '23 22:10 emcodem

@emcodem Fast repsonse, thanks! With the following, it fails to launch now: "[path]\MicrophoneCS.exe" -m "[path]\models\ggml-medium.en.bin" -l en -mc 0 -c 3 -otxt > %1

Also ran it without -otxt > %1 just to see (clearly I'm very new to all this) and saw it launch, but still no output of any kind that I can see obviously.

I don't work regularly with code, but I'm used to building little python optimizations for myself, so forgive me if anything should be obvious that I'm missing confidently.

Oct 04 '23 23:10 DontDeleteTim

-otxt is useless, just omit it when you pipe the output to %1, you must call your batch like: c:\batches\mybatch.bat "c:\temp\outputfile.txt"

This way, %1 stands for "c:\temp\outputfile.txt"

You shall see any "non transcribed output log" in the commandline window that opens up but you can see transcribed text lines only in c:\temp\outputfile.txt (you gotta speak to your microphone to have some output hehe).

Minimum lines you must see in the cmd window that opens up when you call it like this is: Using GPU "NVIDIA RTX A3000 Laptop GPU", feature level 12.1, effective flags Wave32 | NoReshapedMatMul Loaded MEL filters, 62.8 kb RAM Loaded vocabulary, 51865 strings, 3037.1 kb RAM Loaded 167 GPU tensors, 73.5388 MB VRAM Computed CPU base frequency: 2.6112 GHz Loaded model from "C:\temp\whisper\ggml-tiny.bin" to VRAM

Oct 04 '23 23:10 emcodem

@emcodem Mostly understood now.

Edit: More understood, but maybe not successfully. Called my batch file via Win+R, added argument "[path]\outputfile.txt", and noted it was properly inserted into the commandline of the running MicrophoneCS.exe, resulting in the following error:

[path]>"[path]\MicrophoneCS.exe" -m "[path]\models\ggml-medium.en.bin" -l en -mc 0 -c 3 "[path]\outputfile.txt" System.ArgumentException: Unknown argument: "[path]\outputfile.txt" at MicrophoneCS.CommandLineArgs..ctor(String[] argv) at MicrophoneCS.Program.Main(String[] args)

In the interest of not being a total illiterate and wasting your time, at what point is it safe to say this isn't a supported action (text output), or are there still tricks left?

Oct 04 '23 23:10 DontDeleteTim

hehe well it outputs text to stdout, so text output is very much supported. All you need to know is how to pipe from stdout to a text file and it is actually pretty simple once you understood whats stdout and redirect. Your example reads like this: [path]>"[path]\MicrophoneCS.exe" -m "[path]\models\ggml-medium.en.bin" -l en -mc 0 -c 3 "[path]\outputfile.txt"

This is total nonsense in windows batching, the first [path] does nothing at all and the > is wrongly placed. To get it work, start on commandline instead of using in batch and once you got it working and understand what it does, try to port from commandline to a batch file.

Do like: command arguments > c:\temp\stdout.txt

some real life examples:

dir > c:\temp\stdout.txt

dir /b > c:\temp\stdout.txt

"[path]\MicrophoneCS.exe" -m "[path]\models\ggml-medium.en.bin" -l en -mc 0 -c 3 > "[path]\stdout.txt"

Oct 04 '23 23:10 emcodem

Ah, let me back up then. And I appreciate you explaining it to me like a toddler, it helped me find the core of my confusion.

I see the output file now, and I see it in the temp folder where it was generated. My error was in understanding how the batch file worked, and seeing the successful output (I failed to see it, though the program did not fail to create it). I'd like to blame a long day, but I shouldn't. Thanks again, @emcodem I didn't really internalize you explaining the output would get dumbed into text instead of displayed in the command/terminal window.

Long and short, to collect it for idiots like me, to run the MicrophoneCS.exe build via the zip provided:

Write a batch file or launch via command line. Read through to step 3. Write:

"[path]\MicrophoneCS.exe" -m "[path]\models\ggml-medium.en.bin" -l en -mc 0 -c 3 > "[path]\outputfile.txt"
pause

Replace [path] preceding the EXE, models folder, and output text file each respectively with wherever you put them or want them.
Replace the 3 in my case with some number 0 or greater that you can get by running: [path]\MicrophoneCS.exe" -m "[path]\models\ggml-medium.en.bin" -l en -mc 0 -ld pause where -ld provides a list of audio input devices. My microphone of choice was index item 3.
Run as normal with your edited line from step 1. To see the output, open the text file. It'll update continuously as the streaming version of this always does, but it will fill it with all output.

Oct 05 '23 00:10 DontDeleteTim

Whisper Whisper copied to clipboard

Use Whisper from CMD

Whisper
Whisper copied to clipboard