llm-awq icon indicating copy to clipboard operation
llm-awq copied to clipboard

awq use more GPU memory than gptq

Open lyg95 opened this issue 2 years ago • 2 comments
trafficstars

We tested the llama model using AWQ and GPTQ. It does have higher accuracy than GPTQ.

But we found that when using AWQ code to infer the llama model, it uses more GPU memory than GPTQ.

The following are the relevant test results

For llama-7b w4 group_size=128, the quantized model size is 3.7G.

use A100 40GB and test on human-eval

GPTQ

  • use_cache=True Maximum memory used:9.505859375GB
  • use_cache=False Maximum memory used:9.115234375GB

AWQ

  • use_cache=True Maximum memory used:26.47265625GB
  • use_cache=False Maximum memory used:36.96484375GB

There are two points to pay attention to the above results.

  1. In the inference stage, GPTQ can use less memory than AWQ
  2. For AWQ, use_cache=False uses more memory( usually use_cache=True requires more memory)

use_cache=False We use GPTQ script to infer 4bit llama-65b, which can be run on a single GPU. When using AWQ, the OOM will occur.

I would like to ask if you have any of the above problems during the test. Could you please provide your thoughts on the above issues? Thank you so much.

I noticed that in the forward phase, the main difference between GPTQ and AWQ is that AWQ uses Tensor cores (I am not familiar with the contents of tensor cores). Will Tensor cores cause more memory usage?

lyg95 avatar Aug 01 '23 03:08 lyg95

Hi, great to hear that you observe a better accuracy!

The memory usage part is weird. When we test AWQ, it uses similar memory size as GPTQ. Could you provide the script you used to run inference with AWQ? Thanks!

tonylins avatar Aug 01 '23 19:08 tonylins

Thanks for your reply. I conducted a more detailed analysis of the quantization model. I find that the aforementioned problem is not caused by the awq algorithm. It should be related to the test tasks as well as the model configuration. For the issues related to use_cache, I will analyze the code myself later.

lyg95 avatar Aug 03 '23 03:08 lyg95