BitNet Question - Do you still need enough RAM to run the models?

I assume for a 100B model you need more or 100Gigs of RAM, or does this reduce the ram requirements?

Oct 18 '24 15:10 augmentedstartups

# Download the model from Hugging Face, convert it to quantized gguf format, and build the project
python setup_env.py --hf-repo HF1BitLLM/Llama3-8B-1.58-100B-tokens -q i2_s

# Or you can manually download the model and run with local path
huggingface-cli download HF1BitLLM/Llama3-8B-1.58-100B-tokens --local-dir models/Llama3-8B-1.58-100B-tokens
python setup_env.py -md models/Llama3-8B-1.58-100B-tokens -q i2_s

I initially ran these commands on a system with 26GB of RAM, but it caused the system to crash. After adding 20GB of swap space, the process was able to continue. However, it took approximately 20 minutes to complete.

#adding swap
sudo fallocate -l 20G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Currently running on low resources (3.9Gb RAM, 17% CPU)

Oct 18 '24 20:10 halak0013

I allocated 20 GB of ram + 20 GB of swap file for my WSL2 (I run on WSL2).

Oct 19 '24 13:10 avcode-exe

Thanks for the question. In the model conversion phase of the demo, large RAM is still needed. The inference stage will required much less memory and it should be most users' scenario.

Oct 20 '24 02:10 sd983527

Please try with the latest model on HF. https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf

Apr 17 '25 07:04 sd983527