LightCompress
LightCompress copied to clipboard
OOM Error when Running AWQ + OmniQuant Combination (Step 2: OmniQuant) Despite Using Multiple GPUs
I'm encountering an Out-of-Memory (OOM) error when trying to run the second step (OmniQuant) of the AWQ + OmniQuant combination quantization method. This happens despite having allocated 2 A40 GPUs for the task.
Error Message:
Configuration Details Step 2 OmniQuant Config (step_2_omniq.yml):
base:
seed: &seed 42
model:
type: Llama
# Load AWQ-transformed model
path: /workspace/llmc/save_awq_trans/transformed_model
torch_dtype: auto
calib:
name: wikitext2
download: True
path: calib data path
n_samples: 128
bs: 1
seq_len: 2048
preproc: wikitext2_gptq
seed: *seed
eval:
eval_pos: [fake_quant]
name: wikitext2
download: True
path: eval data path
seq_len: 2048
# For 7B / 13B model eval, bs can be set to "1", and inference_per_block can be set to "False".
# For 70B model eval, bs can be set to "20", and inference_per_block can be set to "True".
bs: 1
inference_per_block: False
quant:
method: OmniQuant
weight:
bit: 4
symmetric: False
granularity: per_group
group_size: 128
calib_algo: learnable
ste: True
special:
aug_loss: True
lwc: True
let: False
lwc_lr: 0.01
let_lr: 0.005
use_shift: False
alpha: 0.5
deactive_amp: True
epochs: 5
wd: 0
# Use AWQ's search clip factors to initialize OmniQuant's clip factors,
# Then refine them through learning (LWC).
# Only the version v2 clipping method supports LWC.
# This process is automatically handled in OmniQuant's code.
search_clip_init: True
quant_out: True
save:
save_trans: False
save_fake: False
save_path: /path/to/save/
Run Script (run_llmc.sh):
#!/bin/bash
export CUDA_VISIBLE_DEVICES=0,1
llmc=/workspace/llmc
export PYTHONPATH=$llmc:$PYTHONPATH
task_name=step_2_omni
config=${llmc}/configs/quantization/combination/awq_comb_omni/w4a16g128/step_2_omniq.yml
nnodes=1
nproc_per_node=1
find_unused_port() {
while true; do
port=$(shuf -i 10000-60000 -n 1)
if ! ss -tuln | grep -q ":$port "; then
echo "$port"
return 0
fi
done
}
UNUSED_PORT=$(find_unused_port)
MASTER_ADDR=127.0.0.1
MASTER_PORT=$UNUSED_PORT
task_id=$UNUSED_PORT
nohup \
torchrun \
--nnodes $nnodes \
--nproc_per_node $nproc_per_node \
--rdzv_id $task_id \
--rdzv_backend c10d \
--rdzv_endpoint $MASTER_ADDR:$MASTER_PORT \
${llmc}/llmc/__main__.py --config $config --task_id $task_id \
> ${task_name}.log 2>&1 &
sleep 2
ps aux | grep '__main__.py' | grep $task_id | awk '{print $2}' > ${task_name}.pid
# You can kill this program by
# xargs kill -9 < xxx.pid
# xxx.pid is ${task_name}.pid file
Thanks!