TensorRT-LLM
TensorRT-LLM copied to clipboard
How to build int4_gptq on Mixtral 8x7b
I use following code to generate the checkpoint:
set -e
export MODEL_DIR=/mnt/memory
export MODEL_NAME=Mixtral-8x7B-Instruct-v0.1
export LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/tensorrt/bin:$PATH
export PRECISION=int4_gptq_a16
export QUANTIZE=int4_gptq
export DTYPE=bfloat16
export PYTHONPATH=/app/tensorrt-llm:$PYTHONPATH
python ../llama/convert_checkpoint.py \
--model_dir $MODEL_DIR/${MODEL_NAME} \
--output_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/1-gpu \
--dtype $DTYPE \
--use_weight_only \
--weight_only_precision $QUANTIZE
get error:
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700
0.10.0.dev2024050700
Traceback (most recent call last):
File "/app/tensorrt-llm/examples/llama/../llama/convert_checkpoint.py", line 466, in <module>
main()
File "/app/tensorrt-llm/examples/llama/../llama/convert_checkpoint.py", line 445, in main
assert args.modelopt_quant_ckpt_path is not None
It looks the convert_checkpoint.py require a parameter for modelopt_quant_ckpt_path. How to generate modelopt_quant_ckpt_path?
Thank you for the report. GPT-Q is not supported in MoE model.
Thanks @byshiue for the response. Will it be supported at sometime in future?
We are working on the feature. We will update here if the feature is supported.