grok-1
grok-1 copied to clipboard
Quantized Version or Reduced Parameter Variant of Grok-1
I propose a script to create a quantized version of Grok-1 or a version with less parameters. Ideally, it should possible to run the model on a single GPU or even on CPU
waiting for the Ollama or LM Studio version :)
A hybrid approach would be better, not only quantizing the model but offloading to VRAM would be a great feature, like GGUF.
Loading some layers on GPU and the rest of them on the computer RAM (processed via the CPU) has the best performance for each setup, as long as there is enough VRAM and RAM.
RAM is much cheaper therefore the model will be accessible to more people.
Inferencing the barebone version of this model on a CPU will be next to unusable. The closest we can get to running this model locally is by using a heavily quantized GGUF version. GGUF support isn't here yet, but seems like nothing is stopping it from coming.
After that, the model should be usable on "consumer" hardware. As in, something on the level of Ryzen 7000 and 128 GB of DDR5 RAM.
I think Grok-0 is a good alternative for now, or if they don’t plan on doing that, which is understandable because training takes time. But Grok-0 is already there and ready, just needs to be released. It shouldn’t cost them much at all, besides seeding a torrent for a while and updating the inference code. The very confusing wording on their site leads me to believe with fairly high confidence that Grok-1 is just a MoE of a bunch of Grok-0's so it would just be 33B parameters
@nonetrix Thats true, Grok-1 is a Multi Agent model.
You just gave me a great idea !!!
If the model is just a container for the 8 separate agents and the agents do not share layers, I have to find a way to load only a few agents at any time. In that case it would be possible to load as many agents as our hardware can handle. I have just 2 x GPUs with 24GB each (48GB VRAM total) and 128GB ram (I may upgrade to 256GB), it will be fun to play even with just two or three agents.
I just opened the model file, it is possible to select specific agent. I will try to find some time to play with it in the weekend, maybe there is a way to extract each agent model to a different file and load some of them.
A few people have had that idea seems including myself, I assume that the final result of that will be that it's oddly bad at some things and other things it's fine in. It chooses a expert on a per token biases and it's not like how you think it would where for example if it's tech related it would choose a tech expert and so on, rather it seems to be more based on the semantics of said token and basically it is like the model as a whole is split if that makes sense but theirs not like sections specifically for each task
I understand that it is very complex.
Trying to make it work on a lower specs machine does not guarantee that it will work but not trying guarantees that it will not work.
Folllowing
Don't unless you want to be disappointed, just look at the sad state of the X GitHub org. It's the same formula every time, they release it as open source when it's convenient for Musk then never touch it again. Look at the algorithm repo, I seriously doubt they are running year old code in production yet Elon is happy to use it as a virtue signal and a own to the other platforms. It's all smoke and mirrors, I doubt we will get Grok-0 or future models, this is just something to point at in the OpenAI court case that's it's only reason for existing. Sorry for being a pessimist but it really seems like the trend, they aren't really committed to open source but I would like to be proven wrong
1.58-bit version model of Grok-1 would be something that should be able to work on CPUs but only if it can be made available somehow.
Ref - https://medium.com/ai-insights-cobet/no-more-floating-points-the-era-of-1-58-bit-large-language-models-b9805879ac0a