llm-awq Bad result when running AWQ without GPU

Hi, folks, I met some weird issue when reproducing the results shown in paper. I can get results below with GPU visible, but cannot reproduce it with only CPU. I set the dtype as torch.float to avoid lossing precision from float16.

It's not the inference device issue, the difference comes from the awq_results got w/ and w/o GPU. Is there any workaround the handle it? Any suggestions would be helpful, thanks!

To disable GPU: export CUDA_VISIBLE_DEVICES=''

opt-125m	FP32	group_size	INT4 RTN asym on CPU	AWQ on CPU	AWQ on GPU
wikitext	31.95	G32	33.83	48.52	33.01
wikitext	31.95	G128	35.96	39.53	33.96

Jul 04 '23 10:07 xin3he

Perform AWQ search and save search results (we already did it for you):

python -m awq.entry --model_path facebook/opt-125m \
    --w_bit 4 --q_group_size 128 \
    --run_awq --dump_awq awq_cache/opt-6.7b-w4-g128.pt

Evaluate the AWQ quantized model on WikiText-2 (simulated pseudo quantization)

python -m awq.entry --model_path facebook/opt-125m \
    --tasks wikitext \
    --w_bit 4 --q_group_size 128 \
    --load_awq awq_cache/opt-6.7b-w4-g128.pt \
    --q_backend fake

Jul 05 '23 01:07 xin3he

Hi @xin3he,

Thank you for bringing up this issue!

After our tests, we found that running the model using GPU-generated awq search results on a CPU provides expected results. The problem only occurs when trying to use CPU-generated awq search results.

We strongly recommend you to generate the awq search results using a GPU, as running the model using this awq search result on a CPU afterwards won't have any issues. Additionally, performing awq search on a CPU is significantly slower than on a GPU, so we do not advise doing so.

Thanks again for your interest in our work!

Jul 06 '23 19:07 Sakits

Do you have a guess on why does it happen? Why do you need to run the search on GPU?

Jul 26 '23 08:07 ofirzaf

Do you have a guess on why does it happen? Why do you need to run the search on GPU?

It may be due to different precision on GPU/CPU or how torch executes differently on GPU/CPU

Jul 27 '23 11:07 casper-hansen

llm-awq llm-awq copied to clipboard

Bad result when running AWQ without GPU

llm-awq
llm-awq copied to clipboard