Kimi-K2
Kimi-K2 copied to clipboard
Question: How can resource requirements be scaled down?
Recent advances in BitNet seem to be getting traction, including PTQ of smaller models. Would this be able to maintain some of the performance of K2 while accelerating computation?
Link to a smaller 2B model using BitNet https://bitnet-demo.azurewebsites.net/ https://github.com/microsoft/BitNet
Quantization (or any other kind of “scaling down”) is not on our timeline at this moment. We welcome the community to add more support on these aspects. e.g. I have seen unsloth doing some great work here: https://huggingface.co/unsloth/Kimi-K2-Instruct-GGUF