grok-1
grok-1 copied to clipboard
Run on PC
Maybe stupid question, but how many RAM, VRAM and what processor need to run this :D
300B parameters so I am not hopeful, I have 64GBs of RAM and doubt I am able to even run this even if I used 16GBs of my VRAM even if quantized to like 1 bit lmao. I would like the older Grok-0 as well to at least have something to play with
~630 GB vram at FP16, maybe 700. Crapshoot on if it'll run on 8 H100s, I don't think you can run it on CPU until it gets gguf'd.
I doubt X AI will do it, but when the BitNet code comes out maybe like a 200B version with bitnet would be nice maybe even 120B I think I could run at least one of those since I have already loaded 120B models on this system quantized to hell and back
Would quantizing to .gguf and using a terabyte of RAM help? 🙃
Would need to wait for GGUF support to be added and merged, once that is done maybe those with 256GBs of RAM might have a chance MAYBE 128GBs but I am doubtful. That is just my guess though from my experience with really bad 120B models created by just merging llama 2 with another llama 2 model by stacking the layers, the good news is that at least since the model is so big the performance still will be pretty good when quantizing it
if TheBloke is still doing model quantization, then you can ask him. I'll eventually try to do this, but I'm not sure it will work out well.
GGUF support needs to be added, without GGUF support it is a waste of time to even try to attempt unless you feel like writing some C to make it work, which by all means if you can please do definitely isn't meant to discourage that. The model is unknown to GGUF so it has no idea what to even do with it, I don't want you to waste your time
@nonetrix https://github.com/ggerganov/llama.cpp/issues/6120
Also #21 maybe we can get the older 33B model at least edit: nope lol
It's 314B int8 parameters, so you would need 314GB of memory to load the model, plus some more for things like the K/V cache
I have a PC with 256G RAM, and I'm waiting for gguf.
It’s time to start selecting and purchasing new large memory devices. :D
My motherboard can only support 64GBs and I've already maxed that, might be able to run out of spec up to 128GBs but it's probably not enough since chipset and CPU supports it gigabytes just says it doesn't. Would have to get a threadripper workstation build just for 0.5 tokens a second
I hope, we will get exact answers here: https://github.com/xai-org/grok-1/issues/62
I have a PC with 16G RAM, and I'm waiting for gguf.
Hey if you want a small taste there is a smaller model now fine tuned on this model now, has the same personality as Grok but it's not as smart of course :3
https://huggingface.co/HuggingFaceH4/mistral-7b-grok
Hey if you want a small taste there is a smaller model now fine tuned on this model now, has the same personality as Grok but it's not as smart of course :3
https://huggingface.co/HuggingFaceH4/mistral-7b-grok
Wow, i will try, thanks!
Only problem is there’s a bug in the dataset so it thinks everything is illegal. Also this model is a base model, not instruct tuned
gguf has arrived! The actual measurement of Q3_XS quantization requires 124G memory, which means that a machine with 128G RAM can work!
https://huggingface.co/Arki05/Grok-1-GGUF
$ ./main -m ../gguf/grok-1/grok-1-IQ3_XS-split-00001-of-00009.gguf -s 12346 -n 100 -t 32 -p "I believe the meaning of life is"
llm_load_print_meta: model type = 314B llm_load_print_meta: model ftype = IQ3_XS - 3.3 bpw llm_load_print_meta: model params = 316.49 B llm_load_print_meta: model size = 120.73 GiB (3.28 BPW) llm_load_print_meta: general.name = Grok llm_load_print_meta: BOS token = 1 '[BOS]' llm_load_print_meta: EOS token = 2 '[EOS]' llm_load_print_meta: UNK token = 0 '[PAD]' llm_load_print_meta: PAD token = 0 '[PAD]' llm_load_print_meta: LF token = 79 '<0x0A>' llm_load_tensors: ggml ctx size = 0.81 MiB llm_load_tensors: CPU buffer size = 16716.66 MiB llm_load_tensors: CPU buffer size = 14592.75 MiB llm_load_tensors: CPU buffer size = 14484.75 MiB llm_load_tensors: CPU buffer size = 14901.35 MiB llm_load_tensors: CPU buffer size = 14714.18 MiB llm_load_tensors: CPU buffer size = 14493.75 MiB llm_load_tensors: CPU buffer size = 14484.75 MiB llm_load_tensors: CPU buffer size = 15250.88 MiB llm_load_tensors: CPU buffer size = 3990.96 MiB
I believe the meaning of life is to be the best you can be and to make a positive difference in the world.
This is the story of how I discovered my life’s purpose and how I was able to make a positive difference to people’s lives.
I was born in 1959, and I have always been a very curious child. I was always interested in the world around me, and I wanted to know how things worked.
My parents encouraged my curiosity, and they bought me a lot llama_print_timings: load time = 75099.36 ms llama_print_timings: sample time = 12.02 ms / 100 runs ( 0.12 ms per token, 8318.08 tokens per second) llama_print_timings: prompt eval time = 5213.81 ms / 7 tokens ( 744.83 ms per token, 1.34 tokens per second) llama_print_timings: eval time = 108333.24 ms / 99 runs ( 1094.28 ms per token, 0.91 tokens per second) llama_print_timings: total time = 113705.85 ms / 106 tokens Log end
$ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 128792 MB node 0 free: 128333 MB node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 129015 MB node 1 free: 4537 MB node distances: node 0 1 0: 10 21 1: 21 10
My 256G memory stick (8x DDR3 1866 32G) comes from obsolete server disassembly. In total, they cost me only 640 RMB (about $88).
Grok's talk seems to be mixed with something strange.
The department store entrusts the handling company to transport 1000 glass vases, and the freight for each glass vase is 1.50 yuan. If one is broken, this one will not only not pay the freight, but the handling company will also pay 9.50 yuan. The department store finally paid 1456 yuan. How many vases were broken during the handling? The student answered the question: The department store entrusted the handling company to transport 1000 glass vases, and the freight for each glass vase was 1.50 yuan. If one was broken, this one would not only not pay the freight, but the handling company would also pay 9.50 yuan. The department store finally paid 1456 yuan, that is, 1456 yuan - 1000 yuan * 1.50 yuan = 456 yuan, which is equal to 9.50 yuan * n. It can be seen that n = 48, that is, the number of broken glass is 48, and the number of intact glass is 1000-48=952. Mao Zedong's 100th Birthday A big star in the sky, shining the whole universe! A great man of a generation, leading China to the light! How to find the sum of the first n terms of a geometric sequence How to find the sum of the first n terms of a geometric sequence What is the sum of the first 100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 llama_print_timings: load time = 1635.08 ms llama_print_timings: sample time = 79.24 ms / 500 runs ( 0.16 ms per token, 6309.79 tokens per second) llama_print_timings: prompt eval time = 45058.73 ms / 83 tokens ( 542.88 ms per token, 1.84 tokens per second) llama_print_timings: eval time = 453951.70 ms / 499 runs ( 909.72 ms per token, 1.10 tokens per second) llama_print_timings: total time = 499935.16 ms / 582 tokens