[BUG]: RuntimeError when training gpt with PP=8,TP=1,zero_setted
🐛 Describe the bug
I can run with pp=8,tp=1 w/o zero strategy. myconfig is
# from model import GPT2_small_pipeline_hybrid
from model import GPT_13b_pp1d
import torch
from colossalai.nn.optimizer import HybridAdam
from colossalai.zero.shard_utils import TensorShardStrategy
from colossalai.amp import AMP_TYPE
BATCH_SIZE = 2
NUM_EPOCHS = 4
SEQ_LEN = 4096
NUM_MICRO_BATCHES = 1
HIDDEN_SIZE = 5120
TENSOR_SHAPE = (BATCH_SIZE // NUM_MICRO_BATCHES, SEQ_LEN, HIDDEN_SIZE)
#cudnn_benchmark = True
#cudnn_benchmark = False
# if you do no want zero, just comment out this dictionary
zero = dict(model_config=dict(tensor_placement_policy='cuda', shard_strategy=TensorShardStrategy()),
optimizer_config=dict(initial_scale=2**5))
optimizer = dict(
type=HybridAdam,
lr=0.000015,
weight_decay=1e-2,
)
#fp16 = dict(mode=AMP_TYPE.NAIVE)
model = dict(type=GPT_13b_pp1d,
checkpoint=True, #num_chunks=1,
dtype=torch.half, #fused=True,
)
# pipeline parallel: modify integer value for the number of pipeline stages
# tensor parallel: modify size to set the tensor parallel size, usually the number of GPUs per node
# for the current model implementation, mode can only be 1D or None
parallel = dict(
pipeline=8,
tensor=dict(size=1, mode='1d'),
)
Traceback (most recent call last):
File "train_gpt.py", line 130, in <module>
main()
File "train_gpt.py", line 126, in main
return_output_label=False)
File "/root/gpt/titans/mytrainer.py", line 325, in fit
return_output_label=return_output_label,
File "/root/gpt/titans/mytrainer.py", line 185, in _train_epoch
return_output_label=return_output_label,
File "/root/pkgs/py37/lib/python3.7/site-packages/colossalai/engine/_base_engine.py", line 201, in execute_schedule
output, label, loss = self._schedule.forward_backward_step(self, data_iter, **kwargs)
File "/root/pkgs/py37/lib/python3.7/site-packages/colossalai/engine/schedule/_pipeline_schedule.py", line 395, in forward_backward_step
accum_loss=accum_loss)
File "/root/pkgs/py37/lib/python3.7/site-packages/colossalai/engine/schedule/_pipeline_schedule.py", line 249, in _forward_step
output_obj = self._call_engine(engine.model, data)
File "/root/pkgs/py37/lib/python3.7/site-packages/colossalai/engine/schedule/_pipeline_schedule.py", line 186, in _call_engine
return model(stage_output, **data)
File "/root/pkgs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/pkgs/py37/lib/python3.7/site-packages/colossalai/zero/sharded_model/sharded_model_v2.py", line 235, in forward
outputs = self.module(*args, **kwargs)
File "/root/pkgs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/gpt/titans/model/pipeline_gpt1d.py", line 56, in forward
hidden_states = self.head(self.norm(hidden_states))
File "/root/pkgs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/root/gpt/titans/model/embed.py", line 358, in forward
x = F.linear(x, self.head.weight)
RuntimeError: size mismatch, got 8192, 8192x5120,0
Environment
No response
This version of Pipeline Parallel requires users to adjust model to a distributed model themselves, which maybe not easy to use if you are not familiar with CAI's source code. If you want to try Pipeline Parallel, I recommend you follow the example https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/titans/model/pipeline_gpt1d.py to modify your own model. The other method is to try our developing feature aiming to reduce the users' burden. https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/gpt/experiments/pipeline_parallel
Thanks for your comment. Make PP easy is what we are doing now. If you have any confusion, please don't hesitate to contact us.
We have updated a lot. This issue was closed due to inactivity. Thanks.