whisper.cpp issues

Voice assistant example - the "command" tool

4

There seems to be significant interest for a voice assistant application of Whisper, similar to "Ok, Google", "Hey Siri", "Alexa", etc. The existing [stream](https://github.com/ggerganov/whisper.cpp/tree/master/examples/stream) tool is not very applicable for...

ggerganov

ideas

Pybind11 Issues

5

I have made binding for almost all of the functions and I am trying to get this working in Python but I am encountering errors. Some of this code is...

NebilI

bindings

Silly script: BBC world service streaming text

5

Thanks for the code! This is great! This script will: 1. grab 30s of audio from bbc radio 2. transcribe it 3. spit out transcribed text 4. repeat `bbc_blaster.sh` ```...

semiformal-net

ideas

[feature suggestion] Windows optimizations (segmented heap)

1

I've locally patched up recent binaries with this manifest ```xml true SegmentHeap ``` to make them use Windows 10/11+ new segmented heap (compatibility is for running them outside the vista...

eladkarako

performance

Try to improve the token sampling strategy

4

ref #68, #188, #197 - Add the "max_initial_timestaamp" token logic from OpenAI - Disallow sampling timestamps that are in the past - Add fallback strategy when the timestamp token sampling...

ggerganov

Using `--max-len` gives weird time codes

2

When running whisper.cpp with e.g. `--max-len 77` I get some weird time codes. It does not happen when not using `--max-len`. Examples: ```` [00:34:35.820 --> 00:34:36.820] You built that with...

niksedk

bug

Cancelling a transcript

Is there a way to cancel a transcript that is in progress programmatically? Calling `whisper_free` results in the following error: ``` ggml_new_tensor_impl: not enough space in the context's memory pool...

szeidner

enhancement

No gaps between subs?

5

I used whisper.cpp to process a whole tv series. It's detecting about 99% of the words said but subtitles are not getting any gap/interval between them and I don't want...

gab-luz

Not compiling on m1 mac

Whenever I run `make` I see the following output: ``` ❯ make Makefile:21: Your arch is announced as x86_64, but it seems to actually be ARM64. Not fixing that can...

knpwrs

Label different speakers

1

Might be a stretch, but would it be possible to label different speakers if audio has >1 person talking? This would come handy for conference recordings with multiple presenters, etc.

savchenko

whisper.cpp
whisper.cpp copied to clipboard

Metadata

Voice assistant example - the "command" tool

Pybind11 Issues

Silly script: BBC world service streaming text

[feature suggestion] Windows optimizations (segmented heap)

Try to improve the token sampling strategy

Using `--max-len` gives weird time codes

Cancelling a transcript

No gaps between subs?

Not compiling on m1 mac

Label different speakers

← Metadata

Owner

Metadata

whisper.cpp whisper.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

whisper.cpp
whisper.cpp copied to clipboard