TensorRT-LLM
TensorRT-LLM copied to clipboard
Feature Request: Quantized Mixtral
Glad to see Mixtral support in TensorRT-LLM! Unfortunately it doesn't seem to currently support AWQ with AMMO, as I get the following error with examples/quantization/quantize.py
:
Traceback (most recent call last):
File "/code/tensorrt_llm/examples/quantization/quantize.py", line 200, in <module>
main()
File "/code/tensorrt_llm/examples/quantization/quantize.py", line 192, in main
model = quantize_and_export(model,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/quantized/ammo.py", line 111, in quantize_and_export
raise NotImplementedError(
NotImplementedError: Deploying quantized model MixtralForCausalLM is not supported
I'm sure you're aware of this, but I just wanted to create an issue for tracking quantization support.
Side note, but is there a reason that TensorRT only supports do-it-yourself quantization and not pre-quantized models like the TheBloke produces at Huggingface? I can imagine a lot of users who want to use quantization lack the memory capacity for full models, and you need over 100GB of VRAM to quantize a model like Mixtral.
vote +1
yes please, support for pre-quantized models from HuggingFace would be great.
i'm not even sure i can use multi-gpu setup for DIY quantization using TensorRT-LLM, as this file doesn't have such arguments:
examples/quantization/quantize.py
Also I was planning to use AWQ'd Mixtral as well when stumbled upon this issue.
Side note, but is there a reason that TensorRT only supports do-it-yourself quantization and not pre-quantized models like the TheBloke produces at Huggingface? I can imagine a lot of users who want to use quantization lack the memory capacity for full models, and you need over 100GB of VRAM to quantize a model like Mixtral.
Good question, this is a problem for me and it is something that I was wondering too.
In case it helps, I was able to quantize Mixtral 8x7B with GPTQ as I commented in https://github.com/NVIDIA/TensorRT-LLM/issues/1041#issuecomment-2018773287