llama.cpp llama: use sliding window for phi3

llama: use sliding window for phi3

Open FanShupei opened this issue 7 months ago • 0 comments

[x] I have read the contributing guidelines
Self-reported review complexity:
- [x] Low
- [ ] Medium
- [ ] High

Related issue report: #7709

This PR switches Phi3 model to use sliding window attention. After this PR, it no longer geneartes broken output after the 2,048 token. Tested on "phi3-mini-4k-instruct" model.

TODO: (DONE) ~~convert_hf_to_gguf.py changes~~

Jul 22 '24 09:07 FanShupei

llama.cpp llama.cpp copied to clipboard

llama: use sliding window for phi3

llama.cpp
llama.cpp copied to clipboard