grok-1
grok-1 copied to clipboard
Hardware requirements
What are minimum and recommended hardware requirements to run the model and to do training?
- How much GPU Memory (VRAM) is required?
- How much RAM is required?
- What GPUs are recommended?
- What CPUs are recommended?
- May it be run on single machine or cluster is required?
At https://huggingface.co/xai-org/grok-1 it is written:
Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.
What multi-GPU machine
mean exactly?
Also looks like that model weights itself are about 296.38 GB, so it would require more than 300 GB of storage. Should it be SSD or HDD will be enough? Does that mean that it also require minimum of 300 GB VRAM?
And at README.md
in this repository it is written that:
https://github.com/xai-org/grok-1/blob/e50578b5f50e4c10c6e7cff31af1ef2bedb3beb8/README.md?plain=1#L17
What machine with enough GPU memory
mean exactly?
Please specify the answer in README.md on both GitHub and Hugging Face, it will save lots of time for people. This answer is required for users to decide is it feasible to run the model using available resources to them.
That would be also useful to keep track of tested hardware, so users will know in advance wherever it is possible to use their hardware without additional problems.
Update 2024-03-19: Looks like we have a confirmation that 8 GPUs are required.
many many gpus.
GH200 datacenter rig which cost millions ;)
If you don't know what 300 GB of VRAM is, you have a lot to learn before trying to run this model.
You need 8 of these.... https://www.amazon.com/NVIDIA-Ampere-Passive-Double-Height/dp/B09N95N3PW
Is Jetson AGX Orin Developer Kit capable of running this monster model?
Question is whether the hardware requirements is an issue that can be fixed? Otherwise in my eyes it would seem that making it "Open source" only means making it available to businesses or in rare cases, individuals with the hardware to run it. Or was it just a publicity thing in relation to the OpenAi lawsuit...
looks like magnet download file is soooooo big, 256GB and 2.2% downloaded in progress.
@dabeckham I don't know why people are staring it, no one has tested. Just goes viral. This is not release for us but only for Google, Microsoft and AWS etc. Who can provide 300 + GPU memory???
since rtx4090 only has 24GB vram...
@MuhammadShifa It will be possible to run this on the CPU once support is added to llama.cpp and someone releases 4-bit (or lower) quantized weights. You will need around 256 GB RAM, which is a lot more reasonable for a normal user than needing this much VRAM.
This looks interesting: https://github.com/xai-org/grok-1/issues/42 Speculations could be 96GB of vram if the model can be arranged to work at/with 4-bit quantization for the ggml library. Not sure how nicely ggml plays with jax though.
Looks like 8 * A100 GPUs with 80 GB VRAM each are not enough by themselves either: https://github.com/xai-org/grok-1/issues/125
looks like magnet download file is soooooo big, 256GB and 2.2% downloaded in progress.
@hunter-xue, did you mean 296 GB?
ran it on 8x a100 80g with the code in this repo(no modification, i just added a loop to get input from terminal), using 524GB of vram during single batch inference with nearly no context(10~100 tokens input), speed is only 7 tokens per second #168
Can run it on cloud hardware?
This looks interesting: #42 Speculations could be 96GB of vram if the model can be arranged to work at/with 4-bit quantization for the ggml library. Not sure how nicely ggml plays with jax though.
Just found this in relations to my last post on this thread: https://huggingface.co/eastwind/grok-1-hf-4bit
Looks to be about 90.2 GB on file if you add up the safetensor file shards from the hugging face eastwind repo. There may be more overhead that requires a bit more memory to use for inference. But promising all the same. I hope that grok-1 quants to 4 bit very well. Fingers crossed.
- 94,2 GB for the 4Bit Model
- 188,4 GB for the 8Bit Model
I've just stumbled upon this article from VMWare where you can open-source models in the cloud(s): https://www.vmware.com/products/vsphere/ai-ml.html#democratize
Following this calculation from https://www.substratus.ai/blog/calculating-gpu-memory-for-llm/ you would need
- 94,2 GB for the 4Bit Model
- 188,4 GB for the 8Bit Model
I've just stumbled upon this article from VMWare where you can open-source models in the cloud(s): https://www.vmware.com/products/vsphere/ai-ml.html#democratize
Rough calculations is what I see this formula is for. Which is great for theory and rough plans.
But IRL the actual use of the model will likely have many tiny nuances from many different situations that can add up to change the picture enough to a point where it matters. For either Quantized or straight up open model use at any common float precision datatype.
It's a huge model. And I think that most who are paying attention are curious as a spectator. Which I admit is me as well. This is exciting and interesting stuff!
From what I'm seeing so far from others since my last post, what seems the most reachable, and performant for low mem use, could be roughly about 110 to 120+ GB for quantization. That's just for disk use and when loading the quantized model into mem. (See: https://huggingface.co/Arki05/Grok-1-GGUF for example.) Likely ballooning to a bit more in mem for basic forward passes.
Might be a tight fit for a Apple CPU inference with 128 GB of ram. But still asking a lot.
@MuhammadShifa It will be possible to run this on the CPU once support is added to llama.cpp and someone releases 4-bit (or lower) quantized weights. You will need around 256 GB RAM, which is a lot more reasonable for a normal user than needing this much VRAM.
The maximum amount of RAM i can squeeze into my AM5 board is 192GB of Ram at this moment. Do you think it is feasible to get it running with this?