Georgi Gerganov

Results 1015 comments of Georgi Gerganov

Looking into this now

I would like to refactor the `ggml_rope_custom` API and remove `ggml_rope_with_freq_factors` before merging - will push in a bit

Yup, would be better to have the factors as tensors. @liuwei-git would you like to give this a go?

Ok. Btw, do you see something that could affect the performance of `phi-2` (no rope factors)? The benchmark is half the performance than usual (217 iters) and I'm wondering if...

@amirzia I think the proposed changes are good - pretty much what I imagined as a first step. I'm not sure what are the benefits of having a git-aware cache...

Ah I see now. The shared location seems reasonable in order to have different apps sharing the same model data. > Although I'm not sure if llama.cpp and other applications...

> Only one thing I am wondering right now, do these servers run on some kind of shared hardware? All tests will be running on dedicated Azure nodes (thanks @aigrant)...

The thold should be 5-10% (e.g. `--defrag-thold 0.1`) If you are getting that error, it means your `--context` is too small. It should be equal to `(num slots)*(max prompt +...

Regarding the PR comment with benchmark information: I find it a little bit distracting since it pops up in all PRs even unrelated to speed. I think it would be...

Hm, not sure why it was down - restarted it again. A service could be useful