whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Write to file correctly when using `stream.exe`

Open AB1908 opened this issue 1 year ago • 4 comments

Hello folks, my use case is to redirect the output to a file while using stream.exe and be able to view it in an editor like VS Code, essentially live transcription that is storing the output correctly and can also be instantly visualized. I invoked it like so:

stream\stream.exe -m .\ggml-model-whisper-tiny.en.bin > ".\live.md"

on Powershell. Am I doing it incorrectly?

AB1908 avatar Dec 11 '22 21:12 AB1908

You can use the -f switch to output to a file. For example, on Unix I do the following:

./stream -t 8 -m models/ggml-${model}.bin -f /tmp/whisper.out 2> /dev/null

The file /tmp/whisper.out contains the realtime output. You can then in a separate terminal read the last line of the file to get the latest transcribed text:

tail -n 1 /tmp/whisper.out

Not sure how this translates to Powershell though.

ggerganov avatar Dec 12 '22 19:12 ggerganov

I was using a precompiled release since I was a tad impatient but I'll run a few other tests when I can find the time. Meanwhile, I'll try using the equivalent of your tail idea. Thanks for the quick response!

AB1908 avatar Dec 12 '22 19:12 AB1908

I did some testing and tail can be hit or miss. Additionally, outputting to a file seems to store lines twice because of the way text generation appears to work. When using it on the shell, it seems to go back and edit the text to make a coherent transcription whereas in the file, it writes it back multiple times. Example to demonstrate:

 I have now gone back to...
 I have now gone back to a slower setting to the...
 improve the...
 improve the...
 transcription accuracy. [BLANK_AUDIO]
 transcription accuracy. So [BLANK_AUDIO]

vs

I have now gone back to a slower setting to
improve the transcription accuracy. So [BLANK_AUDIO]

AB1908 avatar Dec 14 '22 11:12 AB1908

You can try the new "Sliding window" mode in the stream - not sure if this is what you are looking for, but it might be useful.

ggerganov avatar Dec 16 '22 16:12 ggerganov

When writing to file with the "sliding window" mode, the transcription block headers and timestamps are missing, which makes the block level offsets on each line useless.

WIth this command:

./stream --step 0 --model models/ggml-tiny.en.bin --file voice.txt

I get this stdout:

[Start speaking]
### Transcription 0 START | t0 = 48 ms | t1 = 8944 ms

[00:00.000 --> 00:08.000]   Okay, from now, thank you very much, after.

### Transcription 0 END

### Transcription 1 START | t0 = 21640 ms | t1 = 31640 ms

[00:00.000 --> 00:04.096]   been running a soft no campaign since the election, so the announcement of the official
[00:04.096 --> 00:09.088]   no wasn't exactly unexpected. Now not supporting the voices one thing.

### Transcription 1 END

### Transcription 2 START | t0 = 42582 ms | t1 = 52582 ms

[00:00.000 --> 00:04.092]   not all Indigenous people support a voice to Parliament or indeed the process.
[00:04.092 --> 00:09.036]   They are going to be plenty of views. This isn't about that though.

### Transcription 2 END

### Transcription 3 START | t0 = 46240 ms | t1 = 56240 ms

[00:00.000 --> 00:06.024]   or indeed the process, they're going to be plenty of views. This isn't about that though.
[00:06.024 --> 00:10.000]   Everyone is going to make up their own mind about the boys.

### Transcription 3 END
^C

And this logged to voice.txt:

[00:00.000 --> 00:08.000]   Okay, from now, thank you very much, after.
 
[00:00.000 --> 00:04.096]   been running a soft no campaign since the election, so the announcement of the official
[00:04.096 --> 00:09.088]   no wasn't exactly unexpected. Now not supporting the voices one thing.
 
[00:00.000 --> 00:04.092]   not all Indigenous people support a voice to Parliament or indeed the process.
[00:04.092 --> 00:09.036]   They are going to be plenty of views. This isn't about that though.
 
[00:00.000 --> 00:06.024]   or indeed the process, they're going to be plenty of views. This isn't about that though.
[00:06.024 --> 00:10.000]   Everyone is going to make up their own mind about the boys.

Is this expected?

mrmachine avatar Apr 08 '23 04:04 mrmachine

Not expected. As a workaround, you can try to drop the -f flag and instead pipe the std output to a file and parse that

Will close this, as the original problem is resolved. Feel free to open a new issue if the workaround is not enough

ggerganov avatar Apr 14 '23 16:04 ggerganov