larin92
larin92
Yes, but this can be accomplished in one execution of session.run. It looks like one of those lines should be commented out at a time
yes please, support for pre-quantized models from HuggingFace would be great. i'm not even sure i can use multi-gpu setup for DIY quantization using TensorRT-LLM, as this file doesn't have...
> I managed to quantize Mixtral 8x7B to 4 bpw. > > I first tried running this command: > > ```shell > model="models--mistralai--Mixtral-8x7B-Instruct-v0.1" > model_dir="/models/$model" > model_chkpt_dir="/models/$model--trt-chkpt" > > python3...