grok-1 🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡

~~Surprise! You can't run it on your average desktop or laptop!~~

Run grok-1 with less than 🔲 420 G VRAM

Run grok on an Mac Studio with an M2 Ultra and 192GB of unified ram: See: llama2.cpp / grok-1 support @ibab_ml on X

You need a beefy machine to run grok-1

Grok-1 is a true mystical creature. Rumor has it that it lives in the cores of 8 GPU's and that the Model must fit in the VRAM.

This implies that you need a very beefy machine. Very very beefy machine. So beefy...

How do you know if your machine is beefy or not?

Your machine is not beefy if it is not big - the bigger the better, size matters! It has to make the sound of a jet engine when it thinks, also it has to be hot to the touch mostly.

It must also smell like burnt plastic at times. The more big iron, the more heavy the more beefy! If you didn't pay a heavy price for it, such as 100k$++, an arm and a leg, then it is not beefy.

What are some of the working setups?

llama2.cpp:

Mac

Mac Studio with an M2 Ultra
192GB of unified ram.

AMD

Threadripper 3955WX
256GB RAM
0.5 tokens per second.

This repo:

Intel + Nvidia

GPU: 8 x A100 80G
Total VRAM: 640G
CPU: 2 x Xeon 8480+
RAM: 1.5 TB

https://github.com/xai-org/grok-1/discussions/168#discussioncomment-8834090

AMD

GPU: 8 x Instinct MI300X GPU 190G
Total VRAM: 1520G

https://github.com/xai-org/grok-1/issues/130#issuecomment-2005770022

Other / Container / Cloud

GPU: 8 x A100 80G
Total VRAM: 640G
K8 cluster

https://github.com/xai-org/grok-1/issues/6#issuecomment-2007301554

What can you do about it?

Try: See: llama2.cpp / grok-1 support

What are the other options?

Rent a GPU cloud instance with sufficient resources
Subscribe to grok at X (twitter.com)
Study the blade, save up money
Get someone to cosplay as grok

What is the Answer to the Ultimate Question of Life, the Universe, and Everything?

https://github.com/xai-org/grok-1/issues/42

Ref: https://github.com/xai-org/grok-1/discussions/168#discussioncomment-8834090 https://github.com/xai-org/grok-1/issues/130#issuecomment-2004399998 https://github.com/xai-org/grok-1/issues/130#issuecomment-2005770022 https://github.com/xai-org/grok-1/issues/125#issuecomment-2007605076 https://github.com/xai-org/grok-1/issues/6#issuecomment-2007301554 https://github.com/ggerganov/llama.cpp/pull/6204#issuecomment-2016392553 https://github.com/ggerganov/llama.cpp/pull/6204#issuecomment-2016472333 https://github.com/ggerganov/llama.cpp/pull/6204#issuecomment-2016573288

See: Discussion Note: This issue has been edited totally to elevate Issue 42 to serve a much better cause. @xSetech Would you not be tempted to pin this? Edit: Corrected llama2.cpp inaccuracies

Mar 18 '24 02:03 trholding

@trholding and it will work with one GPU?

Mar 18 '24 04:03 yarodevuci

Model sizes must fit in the GPU memory. If the model is too large like most cutting age models, then they split parts of the model and submit the work onto multiple GPU's. So a large model like this would need multiple GPUs.

Mar 18 '24 05:03 trholding

A 4 bit quantized model would likely be at least 96GB, so it might fit on four 24GB cards.

Mar 18 '24 07:03 gardner

Model sizes must fit in the GPU memory. If the model is too large like most cutting age models, then they split parts of the model and submit the work onto multiple GPU's. So a large model like this would need multiple GPUs.

They can technically overflow into system ram if running in OpenCL/CLBlast Mode (slower but working).

Mar 18 '24 14:03 akumaburn

I would rather have elon give us GPU's

Mar 19 '24 20:03 AdaptiveStep

@trholding and it will work with one GPU?

No way. Not this model, even highly quantized. Unless it's a GH200 Data Center edition, which does have 96gb of VRAM integrated with 480GB of CPU ram. Then MAYBE.

Mar 22 '24 17:03 surak

A 4 bit quantized model would likely be at least 96GB, so it might fit on four 24GB cards.

Someone may of figured it out: https://huggingface.co/eastwind/grok-1-hf-4bit/tree/main

Looks to be about 90.2 GB on file when adding up the safetensor shards from the mentioned hugging face eastwind repo. Not sure what that would be needed for loading for inference. Likely will need more for overhead. I can't speak for mem usage or quality since this is still beyond my capacity.

Mar 23 '24 23:03 davidearlyoung

grok-1 grok-1 copied to clipboard

🤖 Now run grok-1 with less than 🔲 420 G VRAM ⚡

Run grok-1 with less than 🔲 420 G VRAM

You need a beefy machine to run grok-1

How do you know if your machine is beefy or not?

What are some of the working setups?

llama2.cpp:

Mac

AMD

This repo:

Intel + Nvidia

AMD

Other / Container / Cloud

What can you do about it?

What are the other options?

What is the Answer to the Ultimate Question of Life, the Universe, and Everything?

grok-1
grok-1 copied to clipboard