lmdeploy [Question] When will lmdploy support code llama quantization?

Motivation

In the code-llama's deploy tutorial, quantization chapter remains to be done, when will this feature finished?

Related resources

No response

Additional context

No response

Sep 25 '23 11:09 gesanqiu

After Mid-Autumn Festival, before 10.20

Sep 25 '23 13:09 lvhan028

Not realizing LMDeploy didn't already support codellama quants, I ended up AWQ quantizing Phind's codellama fine-tune, maybe it can be useful for testing: poisson-fish/Phind-CodeLlama-34B-v2-AWQ The quantization itself completed successfully with no problems, however running inference on the model obviously doesn't work.

Sep 26 '23 19:09 poisson-fish

@lvhan028 Is this still on plan?

Nov 21 '23 03:11 gesanqiu

@pppppM tried, but the performance significantly decreased after quantization

Nov 21 '23 03:11 lvhan028

@pppppM tried, but the performance significantly decreased after quantization

@lvhan028 @pppppM Can I ask in which part you meet the bottleneck? Cause codellama has same archtecture with llams-2, why this happened?

Nov 22 '23 03:11 gesanqiu

@gesanqiu LMDeploy is functionally capable of the quantization of CodeLlama, but in practical use we found that performance significantly declines after quantization.

We are also investigating the specific reasons for this, and what we've found so far is that there are more outliers in the model weights of CodeLlama, especially when you compare it to Llama2.

Nov 22 '23 04:11 pppppM

@gesanqiu LMDeploy is functionally capable of the quantization of CodeLlama, but in practical use we found that performance significantly declines after quantization. We are also investigating the specific reasons for this, and what we've found so far is that there are more outliers in the model weights of CodeLlama, especially when you compare it to Llama2.

Do you mean you meet accuracy issue? May smoothquant help with this issue? And have you been tested the throughput or latenct of AWQ codellama model on lmdeploy?

Nov 22 '23 07:11 gesanqiu

May try v0.4.2.

Jun 12 '24 03:06 lvhan028