ColossalAI
ColossalAI copied to clipboard
why GeminiPlugin zero3+offloading cannot training a 7B model
I got resource with 1T cpu mem and 4 2080ti22GB cards
I try zero3+offloading like
-
plugin = GeminiPlugin(precision=args.mixed_precision, -
initial_scale=2**16, -
shard_param_frac = 1, -
offload_optim_frac = 1, -
offload_param_frac =1, -
tp_size =4, -
max_norm=args.grad_clip
with a 7B model Llama2-Chinese-7b-Chat-ms but it report GPU OOM
You might need to turn on Gradient Checkpoint