2bit 405b?
would be cool
Hi, @ewof!
Thank you for your suggestion. There are several technical difficulties to make it fit into the GPUs for quantization , but it is definitely possible. We are already working on this. Unfortunately, we are a bit short on manpower, so I'm not sure when or if this will happen.
Hi @ewof and @Vahe1994,
No offense intended. AQLM is a fantastic project, and VPTQ has acknowledged your work in its acknowledgments.
I've successfully reproduced the VPTQ method and released several models on Hugging Face, including the 405B LLaMA 3.1, 70B LLaMA 3.1, and 72B LLaMA 3.2.
I welcome discussion and testing—let's explore these together!
Hey @OpenSourceRonin, Thank you for letting us know. We are all for open-source and making models available to people, regardless of which quantization was used. So, of course, no offense taken. You did a great job! Thanks you for your work.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.