lmdeploy [Feature] how to open window attention in qwen-14B?

[Feature] how to open window attention in qwen-14B?

Open amulil opened this issue 1 year ago • 4 comments

Motivation

I know I can change /path/to/turbomind-style/triton_models/weights/config.ini to open NTK-aware interpolation and LogN attention scaling.But where I can open window attention?

If I use NTK-aware interpolation and LogN attention scaling to extend the context from 2k to 16k, Is there any config needing to change for getting the speed as fast as the 2k context?

Related resources

No response

Additional context

No response

Nov 02 '23 03:11 amulil

LMDeploy hasn't support window attention yet.

Nov 02 '23 07:11 lvhan028

LMDeploy hasn't support window attention yet.

@lvhan028 Will Lmdeploy support window attention, It seems LongLora used window attention, If I want deploy a LongLora model.Can i use the Lmdeploy?

Nov 15 '23 07:11 amulil

lmdeploy hasn't supported window attention yet.

Nov 15 '23 09:11 lvhan028

lmdeploy hasn't supported window attention yet.

I mean If I use a Longlora model, Can I use lmdeploy to deploy the model without using window attention?

Nov 15 '23 11:11 amulil

lmdeploy lmdeploy copied to clipboard

[Feature] how to open window attention in qwen-14B?

Motivation

Related resources

Additional context

lmdeploy
lmdeploy copied to clipboard