BitNet
BitNet copied to clipboard
Official inference framework for 1-bit LLMs
Do you have plans to bring this into onnxruntime?
Hi I tried to remove clang and replace it with aarch64-ostl-linux-gcc compiler for ARM but I have errors My platform doens't have python nor conda. I compiled on my platform...
HuggingFace -> Hugging Face
Fixed a small typo in the readme
BitNet.cpp Demo on Google Colab: CPU-based Testing This PR introduces a demo for BitNet.cpp running on Google Colab's CPU environment. Key features: 1. Demonstrates BitNet.cpp capabilities on standard CPUs 2....
Thanks a lot for open sourcing this amazing library! I was wondering whether you tried/are planning to prepare some larger models too, like Llama-3.1-70B/405B. As it seems, there is an...
I assume for a 100B model you need more or 100Gigs of RAM, or does this reduce the ram requirements?
Improve model conversion reliability by defaulting to TL2 quantization During testing, the I2_S quantization method frequently failed when converting the model to GGUF format. The conversion only succeeded when explicitly...
Old: > Official inference framework for 1-bit LLMs New: > Official inference framework for 1-bit and 1-trit LLMs
I followed the instructions in https://github.com/microsoft/BitNet?tab=readme-ov-file#benchmark I downloaded bitnet_b1_58-large, however running the benchmark with `python utils/generate-dummy-bitnet-model.py models/bitnet_b1_58-large --outfile models/dummy-bitnet-125m.tl1.gguf --outtype tl1 --model-size 125M` failed: (bitnet-cpp) PS C:\work\microsoft\BitNet> python utils/generate-dummy-bitnet-model.py models/bitnet_b1_58-large...