gloritygithub11
gloritygithub11
@ming-wei Thanks. I will try it
@ming-wei I have synced to commit "[Fix mistral v0.1 build instructions (#1373)]", now it failed in convert with error: ``` python ../llama/convert_checkpoint.py --model_dir /mnt/memory/Meta-Llama-3-70B-Instruct --output_dir /app/models/tmp/trt_models/Meta-Llama-3-70B-Instruct/w4a16/1-gpu-tp --dtype float16 --use_weight_only --weight_only_precision...
@byshiue I had tried 8B and get the same error before. Noticed that you are using new version 0.11.0.dev2024052100. I will try this version.
@byshiue I sync code to [Update TensorRT-LLM (](https://github.com/NVIDIA/TensorRT-LLM/commit/5d8ca2faf74c494f220c8f71130340b513eea9a9)https://github.com/NVIDIA/TensorRT-LLM/pull/1639[)](https://github.com/NVIDIA/TensorRT-LLM/commit/5d8ca2faf74c494f220c8f71130340b513eea9a9) still get the same error. It is trying to check the loaded model contains quantized param like transformer.layers.0.attention.qkv.per_channel_scale ``` python ../llama/convert_checkpoint.py --model_dir...
I also tried in a clean docker envi, the same error.
works now. Thank you very mush
Thanks @byshiue for the response. Will it be supported at sometime in future?
@byshiue is there an expected date on this support?
Hi @nv-guomingz, I still get the similar error: ``` set -ex export MODEL_DIR=/models export MODEL_NAME=Mixtral-8x7B-Instruct-v0.1 export QUANTIZE=int4_awq export DTYPE=float16 export TORCH_CUDA_ARCH_LIST="8.0" python3 ../quantization/quantize.py \ --model_dir $MODEL_DIR/${MODEL_NAME} \ --output_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$QUANTIZE/1-gpu \...
hi @nv-guomingz is there update one the issue?