mixtral-offloading
mixtral-offloading copied to clipboard
How to transform the orignal Mixtral 8*7B into the mixed HQQ quantized model ?
Really appreciate your great work. As it helps me to run MoE on a consumer GPU. I wonder how u transform the original Mixtral 8*7B into the quantized one using HQQ , as I found your model.safetensors.index.json very special , each part has its own safetensors. Do u have any script or can u tell me the way briefly? I appreciate it very much.