lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

[Feature] how to open window attention in qwen-14B?

Open amulil opened this issue 1 year ago • 4 comments

Motivation

I know I can change /path/to/turbomind-style/triton_models/weights/config.ini to open NTK-aware interpolation and LogN attention scaling.But where I can open window attention?

If I use NTK-aware interpolation and LogN attention scaling to extend the context from 2k to 16k, Is there any config needing to change for getting the speed as fast as the 2k context?

Related resources

No response

Additional context

No response

amulil avatar Nov 02 '23 03:11 amulil

LMDeploy hasn't support window attention yet.

lvhan028 avatar Nov 02 '23 07:11 lvhan028

LMDeploy hasn't support window attention yet.

@lvhan028 Will Lmdeploy support window attention, It seems LongLora used window attention, If I want deploy a LongLora model.Can i use the Lmdeploy?

amulil avatar Nov 15 '23 07:11 amulil

lmdeploy hasn't supported window attention yet.

lvhan028 avatar Nov 15 '23 09:11 lvhan028

lmdeploy hasn't supported window attention yet.

I mean If I use a Longlora model, Can I use lmdeploy to deploy the model without using window attention?

amulil avatar Nov 15 '23 11:11 amulil