grok-1
grok-1 copied to clipboard
Allows CPU-based execution
Adds CPU execution to grok-1 model demo
VERY SLOW!
No one should process real world workloads this way.
This is only meant for early dev work by those who don't have 8 x 40GB GPUs
pip install -r requirements-cpu.txt
sed -i 's/USE_CPU_ONLY = False/USE_CPU_ONLY = True/' run.py
python run.py
Still requires:
- 384GB RAM
- 1.5 minutes to load into memory
- 1.1 hours to "compile" grok-1 model
- 4.2 hours to sample first inference request
Even on a 72 core Xeon Server, these runtimes can require monk-like patience.
So the point isn't to run this end-to-end all day.
It's for developers with high-memory workstations who would rather get this code running slowly than not at all.
Hopefully someone uses this CPU-only workaround early on to bootstrap grok-1 into a more performant model that can eventually be more accessible to a larger pool of devs.
Note: Executing this on most CPUs will emit a series of false warnings about the 8 CPU sub-processes being "stuck". These error messages come from a hardcoded warning within Tensorflow that don't appear to be tuneable or suppressible.
Note 2: If memory usage swells too high, comment out this single line below in checkpoint.py. This reduces peak memory usage from >600GB to closer to ~320GB. The downside is a slightly slower initial load. Adding this "copy_to_shm" load strategy is likely a good time-to-memory trade-off on xAI's server, but may not be on your workstation if it triggers OOM.
def fast_unpickle(path: str) -> Any:
# with copy_to_shm(path) as tmp_path:
with open(path, "rb") as f:
return pickle.load(f)
Still requires:
- 384GB RAM
- 1.5 minutes to load into memory
- 1.1 hours to "compile" grok-1 model
- 4.2 hours to sample first inference request
Could you add your systems specs here?
I'll add it to: https://github.com/xai-org/grok-1/issues/42 and https://github.com/xai-org/grok-1/discussions/183
Still requires:
- 384GB RAM
- 1.5 minutes to load into memory
- 1.1 hours to "compile" grok-1 model
- 4.2 hours to sample first inference request
Could you add your systems specs here?
I'll add it to: #42 and #183
CPU: 2 x Intel Xeon E5-2697 v4 Total RAM: 1.5TB RAM
I'm not sure why I got this error? INFO:rank:(1, 256, 6144) INFO:rank:(1, 256, 131072) INFO:rank:State sharding type: <class 'model.TrainingState'> INFO:rank:(1, 256, 6144) INFO:rank:(1, 256, 131072) INFO:rank:Loading checkpoint at ./checkpoints/ckpt-0 INFO:rank:(1, 8192, 6144) INFO:rank:(1, 8192, 131072) Output for prompt: The answer to life the universe and everything is of course INFO:runners:Precompile 1024 INFO:rank:(1, 1, 6144) INFO:rank:(1, 1, 131072) INFO:runners:Compiling... INFO:rank:(1, 1, 6144) INFO:rank:(1, 1, 131072) jax.errors.SimplifiedTraceback: For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.
jaxlib.xla_extension.XlaRuntimeError: UNIMPLEMENTED: unsupported operand type BF16 in op dot
I'm using Xeon 5320 + 1TB RAM. install the software using the requirement-cpu.txt
I'm not sure why I got this error?
...
jaxlib.xla_extension.XlaRuntimeError: UNIMPLEMENTED: unsupported operand type BF16 in op dot
I'm using Xeon 5320 + 1TB RAM. install the software using the requirement-cpu.txt
I assume you included my changes in run.py too? And changed "USE_CPU_ONLY = False" to "USE_CPU_ONLY = True"?
Hopefully this repository isn't abandoned but it doesn't seem like anyone is maintaining it anymore.
You might be better off running grok-1 in llama.cpp if JAX is crashing for you.
For all those who read this and are struggleing but want to run this model once, here is an article on how I managed to get it run for less than $10.
If you want to test things, you might be better off using the more expensive GCP version because it offers the possiblity to be stopped and then you only pay for storage.
I hope someone finds it helpful.
Article: https://twitter.com/PascalBauerDE/status/1776792056452546822 Fork: https://github.com/pafend/grok-1-brev