Georgi Gerganov
Georgi Gerganov
Looking into this now
I would like to refactor the `ggml_rope_custom` API and remove `ggml_rope_with_freq_factors` before merging - will push in a bit
Yup, would be better to have the factors as tensors. @liuwei-git would you like to give this a go?
Ok. Btw, do you see something that could affect the performance of `phi-2` (no rope factors)? The benchmark is half the performance than usual (217 iters) and I'm wondering if...
@amirzia I think the proposed changes are good - pretty much what I imagined as a first step. I'm not sure what are the benefits of having a git-aware cache...
Ah I see now. The shared location seems reasonable in order to have different apps sharing the same model data. > Although I'm not sure if llama.cpp and other applications...
> Only one thing I am wondering right now, do these servers run on some kind of shared hardware? All tests will be running on dedicated Azure nodes (thanks @aigrant)...
The thold should be 5-10% (e.g. `--defrag-thold 0.1`) If you are getting that error, it means your `--context` is too small. It should be equal to `(num slots)*(max prompt +...
Regarding the PR comment with benchmark information: I find it a little bit distracting since it pops up in all PRs even unrelated to speed. I think it would be...
Hm, not sure why it was down - restarted it again. A service could be useful