grok-1
grok-1 copied to clipboard
🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡
~~Surprise! You can't run it on your average desktop or laptop!~~
Run grok-1 with less than 🔲 420 G VRAM
Run grok on an Mac Studio with an M2 Ultra and 192GB of unified ram: See: llama2.cpp / grok-1 support @ibab_ml on X
You need a beefy machine to run grok-1
Grok-1 is a true mystical creature. Rumor has it that it lives in the cores of 8 GPU's and that the Model must fit in the VRAM.
This implies that you need a very beefy machine. Very very beefy machine. So beefy...
How do you know if your machine is beefy or not?
Your machine is not beefy if it is not big - the bigger the better, size matters! It has to make the sound of a jet engine when it thinks, also it has to be hot to the touch mostly.
It must also smell like burnt plastic at times. The more big iron, the more heavy the more beefy! If you didn't pay a heavy price for it, such as 100k$++, an arm and a leg, then it is not beefy.
What are some of the working setups?
llama2.cpp:
Mac
- Mac Studio with an M2 Ultra
- 192GB of unified ram.
AMD
- Threadripper 3955WX
- 256GB RAM
- 0.5 tokens per second.
This repo:
Intel + Nvidia
- GPU: 8 x A100 80G
- Total VRAM: 640G
- CPU: 2 x Xeon 8480+
- RAM: 1.5 TB
https://github.com/xai-org/grok-1/discussions/168#discussioncomment-8834090
AMD
- GPU: 8 x Instinct MI300X GPU 190G
- Total VRAM: 1520G
https://github.com/xai-org/grok-1/issues/130#issuecomment-2005770022
Other / Container / Cloud
- GPU: 8 x A100 80G
- Total VRAM: 640G
- K8 cluster
https://github.com/xai-org/grok-1/issues/6#issuecomment-2007301554
What can you do about it?
Try: See: llama2.cpp / grok-1 support
What are the other options?
- Rent a GPU cloud instance with sufficient resources
- Subscribe to grok at X (twitter.com)
- Study the blade, save up money
- Get someone to cosplay as grok
What is the Answer to the Ultimate Question of Life, the Universe, and Everything?
https://github.com/xai-org/grok-1/issues/42
Ref: https://github.com/xai-org/grok-1/discussions/168#discussioncomment-8834090 https://github.com/xai-org/grok-1/issues/130#issuecomment-2004399998 https://github.com/xai-org/grok-1/issues/130#issuecomment-2005770022 https://github.com/xai-org/grok-1/issues/125#issuecomment-2007605076 https://github.com/xai-org/grok-1/issues/6#issuecomment-2007301554 https://github.com/ggerganov/llama.cpp/pull/6204#issuecomment-2016392553 https://github.com/ggerganov/llama.cpp/pull/6204#issuecomment-2016472333 https://github.com/ggerganov/llama.cpp/pull/6204#issuecomment-2016573288
See: Discussion Note: This issue has been edited totally to elevate Issue 42 to serve a much better cause. @xSetech Would you not be tempted to pin this? Edit: Corrected llama2.cpp inaccuracies
@trholding and it will work with one GPU?
Model sizes must fit in the GPU memory. If the model is too large like most cutting age models, then they split parts of the model and submit the work onto multiple GPU's. So a large model like this would need multiple GPUs.
A 4 bit quantized model would likely be at least 96GB, so it might fit on four 24GB cards.
Model sizes must fit in the GPU memory. If the model is too large like most cutting age models, then they split parts of the model and submit the work onto multiple GPU's. So a large model like this would need multiple GPUs.
They can technically overflow into system ram if running in OpenCL/CLBlast Mode (slower but working).
I would rather have elon give us GPU's
@trholding and it will work with one GPU?
No way. Not this model, even highly quantized. Unless it's a GH200 Data Center edition, which does have 96gb of VRAM integrated with 480GB of CPU ram. Then MAYBE.
A 4 bit quantized model would likely be at least 96GB, so it might fit on four 24GB cards.
Someone may of figured it out: https://huggingface.co/eastwind/grok-1-hf-4bit/tree/main
Looks to be about 90.2 GB on file when adding up the safetensor shards from the mentioned hugging face eastwind repo. Not sure what that would be needed for loading for inference. Likely will need more for overhead. I can't speak for mem usage or quality since this is still beyond my capacity.