LightCompress icon indicating copy to clipboard operation
LightCompress copied to clipboard

OOM Error when Running AWQ + OmniQuant Combination (Step 2: OmniQuant) Despite Using Multiple GPUs

Open Barryshen1 opened this issue 9 months ago • 0 comments

I'm encountering an Out-of-Memory (OOM) error when trying to run the second step (OmniQuant) of the AWQ + OmniQuant combination quantization method. This happens despite having allocated 2 A40 GPUs for the task.

Error Message:

Image

Configuration Details Step 2 OmniQuant Config (step_2_omniq.yml):

base:
    seed: &seed 42
model:
    type: Llama
    # Load AWQ-transformed model
    path: /workspace/llmc/save_awq_trans/transformed_model
    torch_dtype: auto
calib:
    name: wikitext2
    download: True
    path: calib data path
    n_samples: 128
    bs: 1
    seq_len: 2048
    preproc: wikitext2_gptq
    seed: *seed
eval:
    eval_pos: [fake_quant]
    name: wikitext2
    download: True
    path: eval data path
    seq_len: 2048
    # For 7B / 13B model eval, bs can be set to "1", and inference_per_block can be set to "False".
    # For 70B model eval, bs can be set to "20", and inference_per_block can be set to "True".
    bs: 1
    inference_per_block: False
quant:
    method: OmniQuant
    weight:
        bit: 4
        symmetric: False
        granularity: per_group
        group_size: 128
        calib_algo: learnable
        ste: True
    special:
        aug_loss: True
        lwc: True
        let: False
        lwc_lr: 0.01
        let_lr: 0.005
        use_shift: False
        alpha: 0.5
        deactive_amp: True
        epochs: 5
        wd: 0
        # Use AWQ's search clip factors to initialize OmniQuant's clip factors,
        # Then refine them through learning (LWC).
        # Only the version v2 clipping method supports LWC.
        # This process is automatically handled in OmniQuant's code.
        search_clip_init: True
    quant_out: True
save:
    save_trans: False
    save_fake: False
    save_path: /path/to/save/

Run Script (run_llmc.sh):

#!/bin/bash

export CUDA_VISIBLE_DEVICES=0,1

llmc=/workspace/llmc
export PYTHONPATH=$llmc:$PYTHONPATH

task_name=step_2_omni
config=${llmc}/configs/quantization/combination/awq_comb_omni/w4a16g128/step_2_omniq.yml

nnodes=1
nproc_per_node=1


find_unused_port() {
    while true; do
        port=$(shuf -i 10000-60000 -n 1)
        if ! ss -tuln | grep -q ":$port "; then
            echo "$port"
            return 0
        fi
    done
}
UNUSED_PORT=$(find_unused_port)


MASTER_ADDR=127.0.0.1
MASTER_PORT=$UNUSED_PORT
task_id=$UNUSED_PORT

nohup \
torchrun \
--nnodes $nnodes \
--nproc_per_node $nproc_per_node \
--rdzv_id $task_id \
--rdzv_backend c10d \
--rdzv_endpoint $MASTER_ADDR:$MASTER_PORT \
${llmc}/llmc/__main__.py --config $config --task_id $task_id \
> ${task_name}.log 2>&1 &

sleep 2
ps aux | grep '__main__.py' | grep $task_id | awk '{print $2}' > ${task_name}.pid

# You can kill this program by 
# xargs kill -9 < xxx.pid
# xxx.pid is ${task_name}.pid file

Thanks!

Barryshen1 avatar Mar 28 '25 04:03 Barryshen1