TensorRT-LLM How to build int4_gptq on Mixtral 8x7b

I use following code to generate the checkpoint:

set -e

export MODEL_DIR=/mnt/memory
export MODEL_NAME=Mixtral-8x7B-Instruct-v0.1
export LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/tensorrt/bin:$PATH
export PRECISION=int4_gptq_a16
export QUANTIZE=int4_gptq
export DTYPE=bfloat16
export PYTHONPATH=/app/tensorrt-llm:$PYTHONPATH


python ../llama/convert_checkpoint.py \
    --model_dir $MODEL_DIR/${MODEL_NAME} \
    --output_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/1-gpu \
    --dtype $DTYPE \
    --use_weight_only \
    --weight_only_precision $QUANTIZE

get error:

[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700
0.10.0.dev2024050700
Traceback (most recent call last):
  File "/app/tensorrt-llm/examples/llama/../llama/convert_checkpoint.py", line 466, in <module>
    main()
  File "/app/tensorrt-llm/examples/llama/../llama/convert_checkpoint.py", line 445, in main
    assert args.modelopt_quant_ckpt_path is not None

It looks the convert_checkpoint.py require a parameter for modelopt_quant_ckpt_path. How to generate modelopt_quant_ckpt_path?

May 12 '24 03:05 gloritygithub11

Thank you for the report. GPT-Q is not supported in MoE model.

May 15 '24 03:05 byshiue

Thanks @byshiue for the response. Will it be supported at sometime in future?

May 15 '24 03:05 gloritygithub11

We are working on the feature. We will update here if the feature is supported.

May 17 '24 08:05 byshiue