grok-1 An open-source third-party training with 8 GPUs

An open-source third-party training with 8 GPUs

Open tastelikefeet opened this issue 10 months ago • 1 comments

Hi everyone interested in Grok-1:

We are the ModelScope team, we trained Grok-1 HF version(https://www.modelscope.cn/models/colossalai/grok-1-pytorch/summary) with our training framework SWIFT(https://github.com/modelscope/swift).

We use DeepSpeed zero-3 with cpu offload to train Grok-1, with LoRA. The memory cost is 21G per GPU(8 total) with a dataset max-length 512.

The experiment record can be found here: https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Grok-1-best-practice.md

Currently we does not support Deepspeed when inference, so the inference GPU memory cost(with device_map) is 80G * 8.

Mar 29 '24 08:03 tastelikefeet

thanks for sharing, I would like to try this with your future inference support

Mar 30 '24 23:03 chg0901

grok-1 grok-1 copied to clipboard

An open-source third-party training with 8 GPUs

grok-1
grok-1 copied to clipboard