AQLM How long for the quantizing a 70b model? I had ran for 2days

How long for the quantizing a 70b model? I had ran for 2days

Open xiechengmude opened this issue 11 months ago • 2 comments

is it toooo long to quantized a model ?

Mar 04 '24 05:03 xiechengmude

python main.py $MODEL_PATH $DATASET_PATH --nsamples=1024 \ --num_codebooks=1 --nbits_per_codebook=16 --in_group_size=8 \ --relative_mse_tolerance=0.01 --finetune_relative_mse_tolerance=0.001 \ --finetune_batch_size=32 --local_batch_size=1 --offload_activations \ --wandb --save $SAVE_PATH

Mar 04 '24 05:03 xiechengmude

Hello! Thank you for your interest in the project. Yes indeed, AQLM quantization takes considerably longer to calibrate than simpler quantization methods such as GPTQ. This only impacts quantization time, not inference time. Quantization depends on your model size, hardware(number of GPUs , GPUs models e.t.c.) and quantization parameters. I added more details on quantization time in ReadME. Hope this helps. If you have any additional questions, please feel free to ask.

Mar 04 '24 10:03 Vahe1994

could you share a example script for quantizing a 70b model on 8*A100 ?

Mar 24 '24 08:03 xiechengmude

Hi! Hope this helps: WANDB_PROJECT="wandb_project" WANDB_NAME="wandb_name" HF_HOME="/mnt/LLM" CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OMP_NUM_THREADS=16 MKL_NUM_THREADS=16 python main.py meta-llama/Llama-2-70b-hf "pajama" --relative_mse_tolerance=0.01 --finetune_relative_mse_tolerance=0.001 --nsamples=2048 --num_codebooks=1 --nbits_per_codebook=16 --in_group_size=8 --finetune_batch_size=32 --local_batch_size=2 --wandb --save="path_to_save"

Mar 24 '24 15:03 Vahe1994

If you want farther improve ppl, you can additionally run global fine-tuning after you obtained quantized model see https://github.com/Vahe1994/AQLM/pull/50 for the code and see https://github.com/Vahe1994/AQLM/issues/49 for example how to run it.

Mar 24 '24 15:03 Vahe1994

This issue is stale because it has been open for 30 days with no activity.

Apr 24 '24 01:04 github-actions[bot]

This issue was closed because it has been inactive for 14 days since being marked as stale.

May 09 '24 01:05 github-actions[bot]

AQLM AQLM copied to clipboard

How long for the quantizing a 70b model? I had ran for 2days

AQLM
AQLM copied to clipboard