BitNet
BitNet copied to clipboard
Official inference framework for 1-bit LLMs
Hello, and thanks for this excellent project! I am currently using the Llama3-8B-1.58-100B-tokens quantized model (ggml-model-i2_s.gguf) from the BitNet repository. The model performs well during inferencing, but I am having...
I have fine-tuned the bitnet_b1_58-large (https://huggingface.co/1bitLLM/bitnet_b1_58-large) on the Alpaca Instruction Tuning dataset. After conversion, the `f32.gguf` model is giving proper results. But the `i2_s.gguf` is just outputting random tokens. Hopefully,...
I am using the standard example `python run_inference.py -m models/Llama3-8B-1.58-100B-tokens/ggml-model-i2_s.gguf -p "Daniel went back to the the the garden. Mary travelled to the kitchen. Sandra journeyed to the kitchen. Sandra...
Adds tl2 to the quant-type optional argument in the setup_env.py instructions Adds `-p` to the suggested setup_env.py commands to use the pretuned kernels by default
First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK. Considering that this is using GGML and seems based directly on `llama.cpp`: Why is this a separate project to `llama.cpp`, given...
Amazing work and fantastic resource, thanks for sharing your work - this should jump start usage of llm on low resource devices. Quick question - is there a guide to...
- add generated files to .gitignore - removed empty loops and commented out for for memory handling - added call to free to avoid memory leak
I'm using : - MacOS Ventura 13.2.1 - MacBook Air M1 When I execute the command : ```python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s``` I got the message: ``` INFO:root:Compiling the...
when i use LLVM-ET-Arm-19.1.1-Linux-AArch64.tar.xz in ubuntu aarch64 ,its not work well , can I cross-compile with gcc compiler ?
I have successfully implemented BitNet but when I am trying to add it to Ollama with "ggml-model-i2_s.gguf" it fails: ``` ollama create bitnet -f Modelfile transferring model data 100% Error:...