larin92

Results 3 comments of larin92

Yes, but this can be accomplished in one execution of session.run. It looks like one of those lines should be commented out at a time

yes please, support for pre-quantized models from HuggingFace would be great. i'm not even sure i can use multi-gpu setup for DIY quantization using TensorRT-LLM, as this file doesn't have...

> I managed to quantize Mixtral 8x7B to 4 bpw. > > I first tried running this command: > > ```shell > model="models--mistralai--Mixtral-8x7B-Instruct-v0.1" > model_dir="/models/$model" > model_chkpt_dir="/models/$model--trt-chkpt" > > python3...