lmdeploy
lmdeploy copied to clipboard
[Feature] how to open window attention in qwen-14B?
Motivation
I know I can change /path/to/turbomind-style/triton_models/weights/config.ini
to open NTK-aware interpolation and LogN attention scaling.But where I can open window attention?
If I use NTK-aware interpolation and LogN attention scaling to extend the context from 2k to 16k, Is there any config needing to change for getting the speed as fast as the 2k context?
Related resources
No response
Additional context
No response
LMDeploy hasn't support window attention yet.
LMDeploy hasn't support window attention yet.
@lvhan028 Will Lmdeploy support window attention, It seems LongLora used window attention, If I want deploy a LongLora model.Can i use the Lmdeploy?
lmdeploy hasn't supported window attention yet.
lmdeploy hasn't supported window attention yet.
I mean If I use a Longlora model, Can I use lmdeploy to deploy the model without using window attention?