exllama
exllama copied to clipboard
(Experimental) Add support to NTK RoPE scaling
This adds support for the new NTK RoPE scaling, mentioned in https://github.com/turboderp/exllama/issues/115.
"According to this post, this is a method of rope scaling that result in less perplexity loss and a bigger possible scaling: https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/"
Adds the parameter "a", "alpha", which is used when loading a model with "-a"
Tested on 65B models at 4K context, with 48GB VRAM (2x24) using gs 16,20