Georgi Gerganov

Results 420 comments of Georgi Gerganov

It's already added: https://github.com/ggerganov/whisper.cpp/commit/9fe7306f4b16a974361b6a8bea370d6f5c3552f2 The old one is called `large-v1`. The new one is called just `large`. If you have already downloaded the old one, make sure to rename it...

Hi @regstuff , The Windows build is currently a weak point of the project, mainly because I don't have this operating system available to test with. Having a precompiled executable...

@RYucel See the Windows steps here: https://github.com/ggerganov/whisper.cpp/actions/runs/3517978497/workflow#L117-L144 Or check cross-compilation instructions here: https://github.com/ggerganov/whisper.cpp/issues/168

Users that want to support a certain template should open a PR and implement it in the framework that we already have

Just sent you a collaborator invite Edit: on second thought, I revoked the invite for the moment. I just noticed that your Github account is very new so I hope...

When the contexts swap occurs and it has to re-evaluate the second half of the context (i.e. `n_ctx/2 = 1024` tokens), one of the "scratch" buffers runs out of memory....

Alright. Thank you very much for the help. I will update the target branch to disable flash attention when HIP is enabled for now

It's just it hasn't been needed yet. You can either submit a PR implementing it, or you can use the existing `ggml_map_unary_f32()` which allows you to write custom operators in...

You can easily modify the example to check for EOS token and stop

Here are results on V100 using: ```bash # baseline LLAMA_CUBLAS=1 make -j tests && ./tests/test-backend-ops -o ATTN -b CUDA0 perf # flash attn LLAMA_CUBLAS=1 make -j tests && ./tests/test-backend-ops -o...