grok-1
grok-1 copied to clipboard
An open-source third-party training with 8 GPUs
Hi everyone interested in Grok-1:
We are the ModelScope team, we trained Grok-1 HF version(https://www.modelscope.cn/models/colossalai/grok-1-pytorch/summary) with our training framework SWIFT(https://github.com/modelscope/swift).
We use DeepSpeed zero-3 with cpu offload to train Grok-1, with LoRA. The memory cost is 21G per GPU(8 total) with a dataset max-length 512.
The experiment record can be found here: https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Grok-1-best-practice.md
Currently we does not support Deepspeed when inference, so the inference GPU memory cost(with device_map) is 80G * 8.
thanks for sharing, I would like to try this with your future inference support