BitNet
BitNet copied to clipboard
Official inference framework for 1-bit LLMs
## Type of issue - Thanks guys for this awesome work. I was curious to run llama3-8B on my personal CPU, and the performance is quite impressive (nearly 2x llama.cpp...
Traceback (most recent call last): File "/BitNet/utils/generate-dummy-bitnet-model.py", line 1048, in main() File "BitNet/utils/generate-dummy-bitnet-model.py", line 971, in main model_class = Model.from_model_architecture(hparams["architectures"][0]) File "BitNet/utils/generate-dummy-bitnet-model.py", line 312, in from_model_architecture raise NotImplementedError(f'Architecture {arch!r} not...
The currently supported model appears rather scarce. Would it be considered to support a broader range of models?
Hello, After thoroughly reviewing the source code of both BitNet and T-MAC, I noticed a high degree of overlap between the two. The code implementation seems quite similar, which raises...
I think there is a bug in utils/codegen_tl1.py regarding the usage of {. In the code always {{ and }} is used, but i think this is confusing/unnessary. e.g. https://github.com/microsoft/BitNet/blob/5e39e75325db395285c8f2d84b6cdd6fa49bc27b/utils/codegen_tl1.py#L29...
Hi, I run the Basic Usage by `python run_inference.py -m models/Llama3-8B-1.58-100B-tokens/ggml-model-i2_s.gguf -p "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen....
My os is Windows , when I manually download the model and run with local path : ### huggingface-cli download HF1BitLLM/Llama3-8B-1.58-100B-tokens --local-dir models/Llama3-8B-1.58-100B-tokens ### python setup_env.py -md models/Llama3-8B-1.58-100B-tokens -q i2_s...
According to the paper, it is mentioned that QAT must start from scratch. Should I understand that performing QAT on 70B models requires as much time and resources as full...
I am developing [llmchat.co](llmchat.co), an open source local first chat interface. We do have integrations with Ollama, and LM Studio but one of the biggest hurdles that our initial users...