Georgi Gerganov
Georgi Gerganov
@thomasantony We want to have a C-style API in `llama.h`. We cannot expose C++ constructs For now, leave it like this and let me apply the necessary changes on top...
Superseded by #370
Yes, it is of interest. The tree-based decoding is already fully supported. The speculative streams and multi-stream attention layers should be possible to support, but I would need an actual...
Given these results, I believe the fine-tuned model does not output timestamp tokens for some reason. To confirm that, can you provide the output of the same run after adding...
I see the `transcribe (50359)` token is being decoded a lot of times for some reason. This is not supposed to happen. I just pushed a change to `master` to...
We still see the `50359` token - this is unexpected. I guess best option is to provide instructions for downloading the model so I can test it locally.
On a similar topic, recently I found this project: https://github.com/xenova/transformers.js It has a very efficient inference of Whisper tiny using WASM. They seem to be using something called ONNX Runtime....
> Also: maybe it's a good idea to make it so that `-nt` in [main.cpp](https://github.com/ggerganov/whisper.cpp/blob/master/examples/main/main.cpp?rgh-link-date=2024-01-07T19%3A07%3A23Z#L161) not only does not print timestamps, _but also does not compute them_: > > `wparams.no_timestamps...
Hi @patrickvonplaten - congrats on the release! I believe I have successfully added initial support for the distilled models in the following PR: https://github.com/ggerganov/whisper.cpp/pull/1424 However, I'm worried that for optimal...
Thanks for the links. Will probably look into chunking after I make the `v1.5.0` release of `whisper.cpp`.