Question: How can resource requirements be scaled down?

Open BradKML opened this issue 5 months ago • 1 comments

Recent advances in BitNet seem to be getting traction, including PTQ of smaller models. Would this be able to maintain some of the performance of K2 while accelerating computation?

Link to a smaller 2B model using BitNet https://bitnet-demo.azurewebsites.net/ https://github.com/microsoft/BitNet

Jul 14 '25 01:07 BradKML

Quantization (or any other kind of “scaling down”) is not on our timeline at this moment. We welcome the community to add more support on these aspects. e.g. I have seen unsloth doing some great work here: https://huggingface.co/unsloth/Kimi-K2-Instruct-GGUF

Jul 15 '25 08:07 yulundu